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SEQ ID 8570 (GBS271) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 8; MW 31.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 6; MW 56.3kDa) and in Figure 
62 (lane 10; MW 56.3kDa). 

GBS271-GST was purified as shown in Figure 210, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 396 

A DNA sequence (GBSx0430) was identified in S.agalactiae <SEQ ID 1287> which encodes the amino 
acid sequence <SEQ ID 1288>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have an uncleavable N-term signal seq 
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Pinal Results 

bacterial membrane Certainty=0. 3697 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73593 GB:AE000155 putative metal resistance protein 
[Escherichia coli K12] 
Identities = 128/252 (50%) , Positives = 186/252 (73%) 

Query: 5 NSISLMSLLMASSLVLITLFFSYWQKLNLEKEVIISAIRAVIQLLAVGFLLDYIFGYQNP 64 

++I+ SL +A LV++ + S+ +KL LEK+++ S RA+IQL+ VG++L YIF + 
Sbjct: 13 HNITNESIAIjALMLVVVAILISHKEKIiALEKDILWSVGRAIIQLIIVGYVLKYIFSvDDA 72 

Query: 65 IFTALLMLFMIINASYNAAKRGKGINKGFVISFIAIGSGTIITLSVLIFSGILKFVPNQM 124 

T L++LF+ NA++NA KR K I K F+ SFIAI G ITL+VLI SG ++F+P Q+ 
Sbjct: 73 SLTLLMVLFICFNAAWNAQKRSKYIAKAFISSFIAITVGAGITLAVLILSGSIEFIPMQV 132 

Query: 125 IPVGGMIISNSMVAIGLCYKQLLSEFRSKQEEVETKLALGADILPASIDIIRDVIKTGMV 184 

IP+ GMI N+MVA+GLCY L S+Q++++ KL+LGA AS +IRD 1+ ++ 

Sbjct: 133 IPIAGMIAGNAMVAVGLCYNNLGQRVISEQQQIQEKLSLGATPKQASAILIRDSIRAALI 192 

Query: 185 PTIDSAKTLGIVSLPGMMTGLILAGTSPIQAVKYQMMVTFMLLATTSIASFVATYLAYKI 244 

PT+DSAKT+G+VSLPGMM+GLI AG P++A+KYQ+MVTFMLL+T S+++ +A YL.Y+ 
Sbjct: 193 PTVDSAKTVGLVSLPGMMSGLIFAGIDPVKAIKYQIMVTFMLLSTASLSTIIACYLTYRK 252 

Query: 245 FFNNRKQLWTK 256 

F+N+R QLWT+ 
Sbjct: 253 FYNSRHQLWTQ 264 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 397 

A DNA sequence (GBSx0431) was identified in S.agalactiae <SEQ ID 1289> which encodes the amino 
acid sequence <SEQ ID 1290>. This protein is predicted to be SUGAR TRANSPORT ATP-BINDING 
PROTEIN. (b0490). Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1903 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73592 GB:AE000155 putative ATP-binding component of a 
transport system [Escherichia coli K12] 
Identities = 95/202 (47%) , Positives = 142/202 (70%) , Gaps = 2/202 (0%) 



Query: 


4 


LTFKHVDFKTDDKLVLNDINFAIDEGDFVSIVGPSGSGKSTVLKLASGLMSPTAGHIFFD 


63 






L ++V + D +LN+INF++ G+F I GPSG GKST+LK+ + L+SPT+G + F+ 




Sb j ct : 


8 


LQLQNVGYLAGDAKILNNINFSLRAGEFKLITGPSGCGKSTLLKIVASLISPTSGTLLFE 


67 


Query: 


64 


GKDMQLEPIESRKMISYCFQTPHLFGNTVEDNISFPYHIRHEKVDYRRVDDLFQRFEMD 


123 






G+D++ L+P R+ +SYC QTP LFG+TV DN+ FP+ IR+ + D D +RF + 




Sbjct: 


68 


GEDVSTLKPEIYRQQVSYCAQTPTLFGDTVYDNLIFPWQIRNRQPDPAIFLDFLERFALP 


127 


Query: 


124 


QSYLKQDVKKLSGGEKQRIALIRQLLFEPKVLLLDEVTSALDNHNKAI VEKVI - KSLHDK 


182 






S L +++ +LSGGEKQRI+LIR L F PKVLLLDE+TSALD NK V ++I + + ++ 




Sb j ct : 


128 


DSILTKNIAELSGGEKQRISL1RNLQFMPKVLLLDEITSALDESNKHNVNEMIHRYVREQ 


187 


Query: 


183 


GITILWITHDEEQSRRFANKVL 204 








I +LW+THD+++ A+KV+ 




Sb j ct : 


188 


NIAVLWVTHDKDEINH-ADKVI 208 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1291> which encodes the amino acid 
sequence <SEQ ID 1292>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2053 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 73/214 (34%) , Positives = 133/214 (62%) , Gaps = 9/214 (4%) 



Query: 


4 


LTFKHVD- -FKTDDKLVIiNDINFAIDEGDFVSIVGPSGSGKSTvLKLASGIjMSPTAGHIF 


61 






+TF +V F+ VL +INF ++EG F +++G SGSGKST+L + +GL+ ++G 1 + 




Sbjct: 


6 


ITFNNVSKTFEDSGTQvLKNINFDLEEGKFYTLLGASGSGKSTILNIMAGLLDASSGDIY 


65 


Query: 


62 


FDGKDLNQLEPIESRKMISYCFQTPHLFGN-TvEDNISFPYHIR--HEKVDYRRVDDLFQ 


118 






DG+ +N L PI R I FQ LF + TV +N++F ++ +K +RV + + 




Sbjct: 


66 


LDGERINDL - P INKRD - IHTVFQNYALFPHMWBENVAFALKLKKVDKKE I AKRVKETLK 


123 


Query: 


119 


RFEMDQSYLKQDVKKLSGGEKQRIALIRQLLFEPKVLLIiDEvTSALDNHNKAIvEKVIKS 


178 






++ + + + ++KLSGG++QR+A+ R ++ +P+V+LLDE SALD + ++ ++ 




Sbjct: 


124 


MVQL - EGFENRS IQKLSGGQRQRVAIARAI INQPRWLLDEPLSALDLKLRTEMQYELRE 


182 


Query: 


179 


LHDK-GITILWITHDEEQSRRFANKVLKWNGSI 211 








L + GIT +++THD+E++ ++ + + G I 




Sbjct: 


183 


LQQRLGITFVFVTHDQEEALAMSDWIFVMNEGEI 216 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 398 

5 A DNA sequence (GBSx0432) was identified in S.agalactiae <SEQ ID 1293> which encodes the amino 
acid sequence <SEQ ID 1294>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 0658 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 399 

20 A DNA sequence (GBSx0434) was identified in S.agalactiae <SEQ ID 1295> which encodes the amino 
acid sequence <SEQ ID 1296>. This protein is predicted to be deda protein (dedA). Analysis of this protein 
sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

25 INTEGRAL Likelihood =-12.05 Transmembrane 186 - 202 ( 178 - 208) 

INTEGRAL Likelihood = -8.81 Transmembrane 65 - 81 ( 61 - 89) 

INTEGRAL Likelihood = -7.54 Transmembrane 26 - 42 ( 24 - 47) 

INTEGRAL Likelihood = -0.37 Transmembrane 152 - 168 ( 152 - 168) 

30 Final Results 

bacterial membrane Certainty=0. 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC75377 GB:AE000320 orf , hypothetical protein [Escherichia coli K12] 
Identities = 91/211 (43%) , Positives = 131/211 (61%) , Gaps = 7/211 (3%) 

Query: 2 FLIDFILHIDTHIYAMANTVGNWTYLLLFLVIFVETGAVIFPFLPGDSLLFAAGALAANP 61 
40 FLIDFILHID H+ + G W Y +LFL++F ETG V+ PFLPGDSLLF AGALA+ 

Sbjct: 6 FLIDFILHIDVHLAELVAEYGVWVYAILFLILFCETGLWTPFLPGDSLLFVAGALASLE 65 

Query: 62 KMSFNI VTFLIIFFIAAFIGDSCNFLIGRTFGYRFIKHP FFRRFIKEKNIRDAELYF 118 

N+ +++ IAA +GD+ N+ IGR FG + +P FRR +K ++ 
45 Sbjct: 66 TNDLNVH^lMvVLMLIAAIVGDAV]!^YTIGRLFGEKLFSNPNSKIFRRSYLDK THQFY 121 

Query: 119 EKKGTAAIILGRYIPIIRTFVPFVAGISQLPPKVFIKRAFIAALSWSVIATGSGFLFGNI 178 

EK G IIL R++PI+RTF PFVAG+ ++F IALW++T +G+ FG I 

Sbjct: 122 EKHGGKTIILARFVPIVRTFAPFVAGMGHMSYRHFAAYNVIGALLWVLLFTYAGYFFGTI 181 - 



50 



Query: 179 PFVKQHFSLIILGIVFVTLIPVLISGVKSYR 209 

P V+ + L+I+GI+ V+++P +1 ++ R 
Sbjct: 182 PMVQDNLKLLIVGIIWSILPGVIEIIRHKR 212 



WO 02/34771 



-504- 



PCT/GB01/04789 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 400 

5 A DNA sequence (GBSx0435) was identified in S.agalactiae <SEQ ID 1297> which encodes the amino 
acid sequence <SEQ ID 1298>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3100 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 401 

20 A DNA sequence (GBSx0436) was identified in S.agalactiae <SEQ ID 1299> which encodes the amino 
acid sequence <SEQ ID 1300>. This protein is predicted to be DNA-entry nuclease. Analysis of this protein 
sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 
25 " 

Final Results 

bacterial cytoplasm Certainty=0 . 3990 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

A related GBS nucleic acid sequence <SEQ ID 9323> which encodes amino acid sequence <SEQ ID 9324> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 
35 Identities = 87/157 (55%) , Positives = 110/157 (69%) , Gaps = 1/157 (0%) 

Query: 1 MLDRTIRQYQNRRDTTLPDANWKPLGWHQVAT-NDHYGHAVDKGHLIAYALAGNFKGWDA 59 

+L + RQY+NR++T +W P GWHQV Y HAVD+GHL+ YAL G G+DA 

Sbjct: 116 LLSKATRQYKNRKETGNGSTSWrPPGVIHQVKNLKGSYTHAVDRGHLLGYALIGGLDGFDA 175 

40 

Query: 60 SVSNPQNVVTQTAHSNQSNQKINRGQNYYESLTOKAVDQNKRVRYRVTPLYRNDTDLVPF 119 

S SNP+-N+ QTA +NQ+ + + GQNYYES VRKA+DQNKRVRYRVT Y ++ DLVP 
Sbjct: 176 STSNPKNIAVQTAWANCAQAEYSTGQNYYESKVRKALDQNKRVRYRVTLYYASNEDLVPS 235 

45 Query: 120 AMHLEAKSQDGTLEFNVAIPNTQASYTMDYATGEITL 156 

A +EAKS DG LEFNV +PN Q +DY TGE+T+ 
Sbjct: 236 ASQIEAKSSDGELEFNVLVPNVQKGLQIiDYRTGEvTV 272 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1301> which encodes the amino acid 
50 sequence <SEQ ID 1302>. Analysis of this protein sequence reveals the following: 
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Possible site: 42 
»> Seems to have a cleavable N-term signal seq. 



Final Results 

5 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

10 >GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 

Identities = 89/135 (65%) , Positives = 104/135 (76%) , Gaps = 1/135 (0%) 

Query: 25 SPAGWHRLHHLKGSYDHAVDRGHLLGYALVGGLKGFDASTGNPDNIATQLSWANQANKPY 84 
+P GWH++ +LKGSY HAVDRGHLLGYAL+GGL GFDAST NP NIA Q +WANQA Y 
15 Sbjct: 138 TPPGWHQVKNLKGSYTHAVDRGHLLGYALIGGLDGFDASTSNPKNIAVQTAWANQAQAEY 197 

Query: 85 LTGQNYYEGLVRRALDKGHRTOYRVTLLY-DGDNLIiASGSHLEAKSSDDSLTFNVFVPNV 143 

TGQNYYE VR+ALD+ RVRYRVTL Y ++L+ S S +FAKSSD L FNV VPNV 
Sbjct: 198 STGQNYYESKVRKALDQNKRVRYRVTLYYASNEDLVPSASQIEAKSSDGELEFNVLVPNV 257 

20 

Query: 144 QAGLTADYRTGQIAI 158 

Q GL DYRTG++ + 
Sbjct: 258 QKGLQLDYRTGEVTV 272 

25 An alignment of the GAS and GBS proteins is shown below: 

Identities = 73/135 (54%) , Positives = 92/135 (68%) , Gaps = 2/135 (1%) 

Query: 24 PLGWHQVA-TNDHYGHAVDKGHLIAYALAGNFKGWDASVSNPQNVVTQTAHSNQSNQKIN 82 
P GWH++ Y HAVD+GHL+ YAL G RG+DAS NP N+ TQ + +NQ+N+ 

30 Sbjct: 26 PAGVfflRLHHLKGSYDHAVDRGHLLGYALVGGLKGFDASTGNPDNIATQLSWANQANKPYL 85 

Query: 83 RGQNYYESLWKAVDQNKRWYRVTPLYRNDTDLVPFAMHLEAKSQDGTLEFNVAIPNTQ 142 

GQNYYE LVR+A+D+ RVRYRVT LY D +L+ HLEAKS D +L FNV +PN Q 
Sbjct: 86 TGQNYYEGLVRRALDKGHRVRYRVTLLYDGD-NLLASGSHLEAKSSDDSLTFNVFVPNVQ 144 

35 

Query: 143 AS YTMDYATGE I TLN 157 

A T DY TG+I +N 
Sbjct: 145 AGLTADYRTGQIAIN 159 

40 SEQ ID 9324 (GBS656) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 186 (lane 10; MW 57kDa). 

GBS656-GST was purified as shown in Figure 236, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 402 

A DNA sequence (GBSx0437) was identified in S.agalactiae <SEQ ID 1303> which encodes the amino 
acid sequence <SEQ ID 1304>. Analysis of this protein sequence reveals the following: 



Possible site: 13 

>» Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 932 1> which encodes amino acid sequence <SEQ ID 9322> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1305> which encodes the amino acid 
5 sequence <SEQ ID 1306>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 5350 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 24/73 (32%) , Positives = 37/73 (49%) , Gaps = 2/73 (2%) 

Query: 1 MFYMKLANRLSLAATIVNEANANSPFGIIIHSDK^ 60 

+ YMKLA L TI+ E + SPF I+H+D A N++ E N +++P 

Sbjct: 80 ILYMK1AKENHLPVTIITETHMTSPFAFILHTDHAINLKETRLEVILKQTKNDQLSKQTP 139 

20 

Query: 61 K--KSLWQHFFSQ 71 

+ KS W+ F + 
Sbjct: 140 EKTKSFWKRFLKK 152 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 403 

A DNA sequence (GBSx0438) was identified in S.agalactiae <SEQ ID 1307> which encodes the amino 
acid sequence <SEQ ID 1308>. This protein is predicted to be Isopentenyl-diphosphate delta-isomerase. 
30 Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1649 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:AAG20030 GB:AE005083 isopentenyl pyrophosphate isomerase; Idi 

[Halobacterium sp. NRC-1] 
Identities = 24/77 (31%) , Positives = 40/77 (51%) 

Query: 14 TGLTLNRDQNIPQGLFHLVVDVILFHEDGDVLMMKRHPKKKAFPAYFEATAGGSALKGEN 73 
45 TGL D + G+ H +LF EDG VL+ +R +K+ + +++ T ++G++ 

Sbjct: 42 TGLANRLDAHTGDGWHRAFTCLLFDErXSRVLLAQRADRKRLWDTHWDGTVASHPIEGQS 101 

Query: 74 AKQAILRELKEETGIVP 90 
A + L EE GI P 
50 Sbjct: 102 QVDATRQRLAEELGIEP 118 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 404 

A DNA sequence (GBSx0439) was identified in S.agalactiae <SEQ ID 1309> which encodes the amino 
5 acid sequence <SEQ ID 1310>. This protein is predicted to he phosphoserine phosphatase (serB). Analysis 
of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0613 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB50876 GB:AL096844 putative phosphoserine phosphatase 
[Streptomyces coelicolor A3 (2) ] 
Identities = 96/193 (49%) , Positives = 132/193 (67%) 

20 Query: 5 LLVMDVDSTLIMEEAIDLLAIEAGVGKQVAALTDAAMRGELDFEEALKKRVALLKGLPVT 64 

L+VMDVDSTLI +E I+L A AG +VA +T AAMRGELDFE++L RVALL GL + 
Sbjct: 183 LWMDVDSTLIQDEVIELFAAHAGCEDEVAEVTAAAMRGELDFEQSLHARVALLAGLDAS 242 

Query: 65 ILTDILSSIHFTPGAYELIKECHKRQMKVGLVSGGFHET1DILAKQLQVDYVKANRLGVK 124 
25 ++ + + + TPGA LI+ + +VG+VSGGF + D L +QL +D+ +AN L + 

Sbjct: 243 VVDKVRAEWLTPGARTLIRTLKRLGYQVGWSGGFTQVTDALQEQLGLDFAQANTLE1V 302 



Query: 125 GGFLTGEVEGEIVTKEVKKIKLKEWASENHLDLSQTIAMGDGANDLPMIKSAGVGIAFCA 184 
G LTG V GEIV + K L+ +A+ + LSQT+A+GDGANDL M+ +AG+G+AF A 
30 Sbjct: 303 DGRLTGROTGEIVnRAGKARLLRRFAAAAGVPLSQTVAIGDGANDLDMtNAAGLGVAFNA 362 



35 



45 



50 



Query: 185 KPIVREEAAYQIN 197 

KP+VRE A +N 
Sbjct: 363 KPWREAAHTAVN 375 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 405 

40 A DNA sequence (GBSx0440) was identified in S.agalactiae <SEQ ID 131 1> which encodes the amino 
acid sequence <SEQ ID 1312>. Analysis of this protein sequence reveals the following: 



Possible site: 23 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-17.88 Transmembrane 5 - 21 ( 1 - 29) 



Final Results 

bacterial membrane Certainty=0 . 8153 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06924 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 122/553 (22%) , Positives = 265/553 (47%) , Gaps = 12/553 <2%) 
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Query: 7 LLLVAIVLLVIIAYWGWIRKRNDTBIANriETRKQELVDLPVQEEIEQVKLLHLIGQSQ 66 

+++ ++++L + +V G + RK + LE K +++ P+ +EI +VK L + G+++ 

Sbjct: 3 IWFSLLVLTOTPFTOGALRRKaFYKRVDKLEDWKMDILQRPIPDEIGKVKGLTMSGETE 62 

5 Query: 67 STFREWNQKWTDLSTNSFKDIDFHLVEAENLNDSFNFVRAKHEIDNVDSQLTIIEEDIVS 126 

F W W D+ +++ L + E+ + + F +AK +D ++ +L IEE + 

Sbjct: 63 EKFEVWRSDWDDIVGVILP^rVEEQLFDVEDFANKYRFQKAKALLDTIEQRLHSIEEQLKI 122 

Query: 1 127 IRFJ^EVLKEQEEKNSARVTHALDLYETLQKSISEKEDNYGTTMPEIEKQLKNIEAEFSH 186 
10 + + ++VL + EE+N + +L + L K + + ++ +++L+ 

Sbjct: 123 ^WDDIQVLVQSEEQNRTEIGSWELQQKLIKEAITRRGSLSSSAKVFDEKLEKANELLQA 182 

Query: 187 FVTLNSTGDPIEASEVMIKAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLETGYRRLL 246 
F G+ I+ASEVL +A+E + + + +P + +L+ + P +L +L+ G R + 

15 Sbjct: 183 FDERTEKGNYIQASEVLEEAKELLGQIEHLLKIVPGLFVELQTNIPAELTNLKNGLRDME 242 

Query: 247 EENYHFPEKDIEQRFQEVREAIRSNSDGLVSLDLDRARDENEHIQEKIDKLYDIFEREIA 306 

E + 1+ + + + E + L L+ + +E I+E +++++++ E+E+ 

Sbjct: 243 EAGFFLETFAIDSQMERLEEKRVELLEQLTVLECNGMEEEINFIEESMEQMFELLEKEVE 302 

20 

Query: 307 AYKVAHKDSKIIPQFLAHAKSNNEQLGH EIKRLSAKYILNENESLSLRSFTNDLEEI 363 

A ++ + ++P E+L H E++ YLEE+ + +L+E+ 

Sbjct: 303 A KNEIT1LLPNLREDLTKTEEKLTHLKEETESVQLSYRLAEEELVFQQKLGKELKEL 359 

25 Query: 364 ETKVLPSVENFGQEASPYTHLQILFERTLKTLTTVEENQMEVFEAVKTIESVETRARQNM 423 

++ E ++ ++ ++ + E + LT + + E++ ++ E +A++ + 
Sbjct: 360 RQQLQVIDEVTEEQKQTFSSVRSMLEEWREGLTACQNKIEQAQESLNSLRKDELKAKEEL 419 

Query: 424 DKYVNKLHMIKRFMEKRNLPGIPQDFLSTFFTTSSQIEALINELSRGRIDIEAVSRLNDV 483 
30 + KL KR ++K N+PG+P+ L ++ I +LS +++ V+ L D 

Sbjct: 420 KQLKEKIJjEDKRLVQKSNIPGLPETLLHRLEDGEQKIAQAIAKLSDVPLEMGRVTALVDE 479 

Query: 484 TTNAIANLEQATYLWQDATLTEQLLQYSNRYRSFEQNVQKSFEQALYLFEVEHNYKASF 543 
I + ++ A L E ++QY NRYRS V+K A LF + 
35 Sbjct: 480 AQGLIHENSSILHETIEKARLAEHVIQYGNRYRSRSAEVKKRLSNAEELFRA FEY 534 

Query: 544 DE-ISYALETVEP 555 

DE I A++ +EP 
Sbjct: 535 DEAIEMAVQAIEP 547 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1313> which encodes the amino acid 
sequence <SEQ ID 1314>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
>» Seems to have an uncleavable N-terra signal seq 
45 INTEGRAL Likelihood =-18.04 Transmembrane 5 - 21 ( 1-29) 

Final Results 

bacterial membrane — Certainty=0 . 8217 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB06924 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 131/555 (23%) , Positives = 269/555 (47%) , Gaps = 16/555 (2%) 

Query: 7 LLIVAIVLLVIIAYLVGVIIRKRNDSLITSLEERKOALFALPVNDEIEEVKSLHLIGQSQ 66 

+++ ++++L + ++ G + RK + LE+ K + P+ DEI +VK L + G+++ 

Sbjct: 3 IWFSLLvTiTVTFFVYGALRRKAFYKRVDKLEDWKNDILQRPIPDEIGKVKGLTMSGETE 62 

60 , Query: 67 TSFREWNQKWVDLTVNSFADIEIffllFEAENIi^TFNFIRAKHEINSvESQLNLVEEDIAS 126 

F W W D+ ++E +F+ E+ + + F +AK ++++E +L+ +EE + 

Sbjct: 63 EKFEVTOSDWDDIVGVILPNVEEQLFDvEDFANKYRFQKAKALLDTIEQRLHSIEEQLKI 122 

Query: 127 IREALNILKEQEEKNSARvTHALDLYEKLQASISENEDNFGSTMPEIDKQMKNIETEFSQ 186 
65 + + + +L + EE+N + +L +KL + S+ D++++ 
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Sbjct: 123 MVDDIQVLVQSEEQNRTEIGSWELQQKLIKEAITRRGSLSSSAKVFDEKLEKANELLQA 182 

Query: 187 FVALNSSGDPVEASEVLDRAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLETGYRRLL 246 

F G+ ++ASEVL+ A+E + + + +P + +L+ + P +L +L+ G R + 

Sbjct: 183 FDERTEKGNYIQASEVLEEAKELLGQIEHLLKIVPGLFVELQTNIPAELTNLKNGLRDME 242 

Query: 247 EENYHFPEKNIEARFQEIRESIRANSSELVTLDLDRAREENTHIQERIDSLYEVFEREIA 306 

E + + + E +L L+ + EE I+E ++ ++E+ E+E 

Sbjct: 243 EAGFFLETFAIDSQMERLEEKRVELLEQLTVLECNGMEEEINFIEESMEQMFELLEKE-- 300 

Query: 307 AYKVAAKN- - SKMLPRYLEHVKRNNEQ LKDEIARLSRKYILSETESLTVKAFEKDIK 361 

V AKN + +LP E + + E+ LK+E + Y L+E E + + K++K 
Sbjct: 301 ---VEAKNEITILLPNLREDLTKTEEKLTHLKEETESVQLSYRLAEEELVFQQKLGKELK 357 

15 Query: 362 EIEDSTIAVAEQFGLQEKPFSELQVTFERSIKTLTOVESGQMDVFAAVKDIEKIESQARH 421 

E+ + E Q++ FS ++ E + LT ++ ++ t I B +A+ 

Sbjct: 358 ELRQQLQVIDEVTEEQKQTFSSVRSMLEEWREGLTACQNKIEQAQESLNSLRKDELKAKE 417 

Query: 422 NLDVYVTQLHMIKRYMEKRHLPGIPQDFLSAFFTTSSQLEALMDELSRGRINIEAVSRLS 481 
20 L +L KR ++K ++PG+P+ L +L + +LS + + V+ h 

Sbjct: 418 ELKQLKEKLLEDKRLVQKSNIPGLPETLLHRLEDGEQKLAQAIAKLSDVPLEMGRVTALV 477 

Query: 482 EVATVAIANLEDLTYQWQNATLTEQLLQYSNRYRSFEAGVQSSFEHALRLFEVENDYQA 541 
+ A I +++++ALE ++QY NRYRS A V+ +A LF 
25 Sbjct: 478 DEAQGLIHENSSILHETIEKARLAEHVIQYGNRYRSRSAEVKKRLSNAEELFRA F 532 

Query: 542 SFDE- ISYALETVEP 555 

+DE I A++ +EP 
Sbjct: 533 EYDEAIEMAVQAIEP 547 

30 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 429/574 (74%) , Positives = 503/574 (86%) 

Query: 1 MSSGIILLLVAIVLLVIIAYVVGWIRKRNDTLIANLETRKQELVDLPVQEEIEQVKLLH 60 
35 MSSGIILL+VAIVLLVIIAY+VGV+IRKRMD+LI +LE RKQ L LPV +EIE+VK LH 

Sbjct: 1 MSSGIILLIVAIVLLVI IAYLVGVIIRKRNDSLITSLEERKQALFALPVNDEIEEVKSLH 60 

Query: 61 LIGQSQSTFREV^QKWTDLSTNSFKDIDFHLVEAENL^SFNFVFAKHEIDNVDSQLTII 120 
LIGQSQ++FREWNQKW DL+ NSF DI+ H+ EAENLND+FNF+RAKHEI++V+SQL ++ 
40 .Sbjct: 61 LIGQSQTSFRE1TOQKWVDL1WSFADIENHIFEAENLNDTFNFIRAKHEINSVESQLNLV 120 

Query: 121 EEDIVSIREALEVLKEQEEKNSARVTHALDLYETLQKSISEKEDNYGTTMPEIEKQLKNI 180 

EEDI SIREAL +LKEQEEKNSARVTHALDLYE LQ SISE EDN+G+TMPEI +KQ+KNI 
Sbjct: 121 EEDIASIREALNILKEQEEKNSARVTHALDLYEKLQASISENEDNFGSTMPEIDKQMKNI 180 

45 

Query: 181 EAEFSHFVTLNSTGDPIEASEVLNKAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 240 

E EFS FV LNS+GDP+EASEVL++AEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 
Sbjct: 181 ETEFSQFVALNSSGDPVEASEVLDRAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 240 

50 Query: 241 GYRRLLEENYHFPEKDIEQRFQEVREAIRSNSDGLVSLDLDRARDENEHIQEKIDKLYDI 300 

GYRRLLEENYHFPEK+IE RFQE+RE+IR+NS LV+LDLDRAR+EN HIQE+ID LY++ 
Sbjct: 241 GYRRLLEENYHFPEKNIEARFQEIRESIRANSSELVTLDLDRAREENTHIQERIDSLYEV 300 

Query: 301 FEREIAAYKVAHKDSKIIPQFLAHAKSNNEQLGHEIKRLSAKYILNENESLSLRSFTNDL 360 
55 FEREIAAYKVA K+SK++P++L H K NNEQL EI RLS KYIL+E ESL++++F D+ 

Sbjct: 301 FEREIAAYKVAAKNSKMLPRYLEHVKRNNEQLKDEIARLSRKYILSETESLTVKAFEKDI 360 

Query: 361 EEIETKVLPSVENFGQEASPYTHLQILFERTLKTLTTVEENQMEVFEAVKTIESVETRAR 420 
+EIE L E FG + P++ LQ+ FER++KTLT VE QM+VF AVK IE +E++AR 
60 Sbjct: 361 KEIEDSTIAVAEQFGLQEKPFSELQVTFERSIKTLTNVESGQMDVFAAVKDIEKIESQAR 420 

Query: 421 QM^KYTOKLHMIKRFMEKRNLPGIPQDFLSTFFTTSSQIEALINELSRGRIDIEAVSRL 480 

N+D YV +LHMIKR+MEKR+LPGIPQDFLS FFTTSSQ+EAL+ +ELSRGRI + IEAVSRL 
Sbjct: 421 HNLDVYVTQLHMIKRYMEKRHLPGIPQDFLSAFFTTSSQLEALMDELSRGRINIEAVSRL 480 



65 



Query: 481 M}VTimiATttEQATYLWQDATLTEQLLQYSITOYRSFEQNVQKSFEQALYLFEVEHNYK 540 
++V T AIANLE TY WQ+ATLTEQLLQYSNRYRSFE VQ SFE AL LFEVE++Y+ 
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Sbjct: 481 SEVATVAIMTLEDLTYQWQNATLTEQLLQYSMRYRSFEaGVQSSFEHALRLFEVENDYQ 540 

Query: 541 ASFDEISYALETVEPGVTDRFVTSYEKTQERIRF 574 

ASFDEI SYALETVEPGVTDRFV SYEKT+E IRF 
Sbjct: 541 ASFDEISYALETVEPGOTDRFVNSYEKTREHTRF 574 

SEQ ID 1312 (GBS642) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 142 (lane 2-4; MW 27kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 406 

A DNA sequence (GBSx0441) was identified in S.agalactiae <SEQ ID 1315> which encodes the amino 
acid sequence <SEQ ID 1316>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2471 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 967 1> which encodes amino acid sequence <SEQ ID 9672> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA91553 GB:Z67740 DNA gyrase [Streptococcus pneumoniae] 
Identities = 574/650 (88%) , Positives = 618/650 (94%) , Gaps = 2/650 (0%) 



Query: 


1 


MTEETKNMEQRAQEYDASQ1QVLEGLEAVRMRPGMYIGSTSKEGLHHLVWEIVDNSIDEA 


60 






MTEE KN++ AQ+YDASQIQVLEGLEAVRMRPGMYIGSTSKEGLHHLVWEIVDNSIDEA 




Sbjct: 


1 


MTEEIKNLQ--AQDYDASQIQVLEGLEAVRMRPGMYIGSTSKEGLHHLVWEIVDNSIDEA 


58 


Query: 


61 


LAGFAGHIKVYIEPDNSITVVDDGRGIPVDIQEKTGRPAVEWFTVLHAGGKFGGGGYKV 


120 






LAGFA HI +V+ IEPD+S ITWDDGRGI PVD1QEKTGRPAVETVFTVLHAGGKFGGGGYKV 




Sbjct: 


59 


LAGFASHIQVFIEPDDSITVVDDGRGIPVDIQEKTGRPAVETVFTVLHAGGKFGGGGYKV 


118 


Query: 


121 


SGGLHGVGSSVWALSTQLDVKVYKNGKVHYQEYQRGVVVNDLEIIGDTDLSGTTVHFTP 


180 






SGGLHGVGSSWNALSTQLDV V+KNGK+HYQEY+RG W DLE++GDTD +GTTVHFTP 




Sb j ct : 


119 


SGGLHGVGSSVVNALSTQLDVHVHKNGKIHYQEYRRGHWADLEWGDTDRTGTTVHFTP 


178 


Query: 


181 


DPEIFTETTVFDFDKLAKRIQELAFLNRGLRISISDKREGQEVEKEYHYEGGIGSYVEFI 


240 






DPEIFTETT+FDFDKL KRIQELAFLNRGL+ISI+DKR+G E K YHYEGGI SYVE+I 




Sb j ct : 


179 


DPEIFTETTIFDFDKLNKRIQEIAFLNRGLQISITDKRQGLEQTKHYHYEGGIASYVEYI 


238 


Query: 


241 


NENKEVIFENPIYTDGELDGISVEVAMQYTTGYQETVMSFANNIHTHEGGTHEQGFRTAL 


300 






NENK+VIF+ PIYTDGE+D I + VEVAMQYTTGY E VMSFANNIHTHEGGTHEQGFRTAL 




Sbjct: 


239 


NENKDVIFDTPIYTDGEMDDITVEVAMQYTTGYHENVMSFANNIHTHEGGTHEQGFRTAL 


298 


Query: 


301 


TRVINDYAKKNKILKENEDNLTGEDVREGLTAVISVKHPNPQFEGQTKTKLGNSEWKIT 


360 






TRVINDYA+KNK+LK+NEDNLTGEDWEGLTAVISVKHPNPQFEGQTKTKLGNSEVVKIT 




Sbjct: 


299 


TRVINDYARKNKLLKDNEDNLTGEDTOEGLTAVISVKHPNPQFEGQTKTKLGNSEVVKIT 


358 


Query: 


361 


NRLFSEAFNRFLLENPQVAKKIVEKGILASKARIAAKRAREOTRKKSGLEISNLPGKLAD 


420 






NRLFSEAF+ FL+FJ^PQ+AK+IvEKGILA+KAR+AAKRAREVTRKKSGLEISNLPGKLAD 




Sbjct: 


359 


NRLFSFAFSDFL^NPQIAKKIvEKGIIAAKARVAAKRAREVTRKKSGLEISNLPGKLAD 


418 


Query: 


421 


CSSmAEl^IiFIVEGDSAGGSAKSGRNREFQAILPIRGKILNVEKATMDKILANEEIRS 


480 






CSSNN ELFIVEGDSAGGSAKSGRNREFQAILPIRGKIIjNVEKA+MDKILftNEEIRS 




Sb j ct : 


419 


CSSNNPAETELFIVEGDSAGGSAKSGRNREFQAILPIRGKIIiNVEKASMDKILANEEIRS 


478 
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Query: 481 LFTAMGTGFGADFDVSKVRYQKLVIMTDftDVDGftHIRTLLLTLIYRFMRPVLEAGYVYIA 540 

LFTAMGTGFGA+ FDVSK RYQKLV+MTDADVDGflHIRTLLLTLIYR+M+P+LEAGYVYIA 
Sbjct: 479 LFTAMGTGFGAEFDVSKARYQKIjVLMTDADVDGffiHIRTLLLTLIYRYMKPILEAGYVYIA 538 

5 

Query: 541 QPPIYGVKVGSEIKAYIQPGVNQEEELRQALDTYSSGRSKPTVQRYKGLGEMDDHQLWET 600 

QPPIYGVKVGSEIK YIQPG +QE +L++AL YS GR+KPT+QRYKGLGEMDDHQLWET 
Sbjct: 539 QPPIYGVKVGSEIKEYIQPGADQEIKLQEALARYSEGRTKPTIQRYKGLGEMDDHQLWET 598 

10 Query: 601 TMDPENRLMARVSVDDAREADKIFDMLMGDRVEPRREFIEftNAVYSNLDI 650 

TMDPE+RLMARVSVDDAAEADKIFDMLMGDRVEPRREFIE NAVYS LD+ 
Sbjct: 599 TMDPEHRLMARVSVDDAftEADKIFDMLMGDRVEPRREFIEENaVYSTLDV 648 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1317> which encodes the amino acid 
15 sequence <SEQ ID 1318>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1698 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

25 Identities = 584/650 (89%) , Positives = 618/S50 (94%) 

Query: 1 NWEETKNMEQRAQEYDASQIQVIiEGLEAvRMRPGIWIGSTSKEGLHHLWIEIVDNSIDEA 60 

M EE K+ E++ QEYDASQIQVLEGLEAVRMRPGMYIGST+KEGLHHLVWEIVDNSIDEA 
Sbjct: 1 MIEENKHFEKKMQEYDASQIQVLEGLEAVRMRPGMYIGSTAKEGLHHLVWEIVDNSIDEA 60 

30 

Query: 61 LAGFAGHIKVYIEPDNSITWDTCRGIPVniQEKTGRPAvETVT'TvLHAGGKFGGGGYKV 120 

LAGFA HIKV+IE DNSITWDDGRGIPVDIQ KTGRPAVETVFTVLHAGGKFGGGGYKV 
Sbjct: 61 IAGFASHIKVFIFADNSITvVDDGRGIPvDIQAKTGRPAVETVFTVI.HAGGKFGGGGYCT 120 

35 Query: 121 SGGLHGVGSSVVWALSTQLDVKVYKNGKVHYQEYQRGVVVNDLEIIGDTDLSGTTVHFTP 180 

SGGLHGVGSSWNALSTQLDV+VYKNG++HYQE++RG W DLE+IG TD++GTTVHFTP 
Sbjct: 121 SGGLHGVGSSvYNALSTQLDVRVYKNGQIHYQEFKRGAWADLEVIGTTDVTGTTVHFTP 180 

Query: 181 DPEIFTETWFDFDKIjyKRIQELAFIiNRGLRISISDKREGQEVEKEYHYEGGIGSYVEFI 240 
40 DPEIFTETT FD+ LAKRIQEIAFI.NRGL+ISI+DKR G E E+ + YEGGIGSYVEF+ 

Sbjct: 181 DPEIFTETTQFDYSVLAKRIQEIAFLNRGIiKISITDKRSGMEQEEHFLYEGGIGSYVEFL 240 

Query: 241 NENKEVIFENPIYTDGELDGISVEVAMQYTTGYQETVMSFANNIHTHEGGTHEQGFRTAL 300 
N+ K+VIFE PI YTDGEL+GI +VEVAMQYTT YQETVMSFANNIHTHEGGTHEQGFR AL 
45 Sbjct: 241 NDKCTVIFETPIYTDGELEGIAVEVAMQYTTSYQETVMSFANNIHTHEGGTHEQGFRAAL 300 

Query: 301 TRVINDYAKKNKILKENEDNLTGEDWEGLTAVISVKHPNPQFEGQTKTKLGNSEVVKIT 360 

TRVINDYAKKNKILKENEDNLTGEDVREGLTAVISVKHPNPQFEGQTKTKLGNSEWKIT 
Sbjct: 301 TRVINDYAKKNKILKENEDNLTGEDVREGLTAVISVKHPNPQFEGQTKTKLGNSEVVKIT 360 



50 



Query: 361 NRLFSEAFNRFIiLENPQVAKKI VEKGI LASKARIAAKRAREVTRKKSGLE I SNLPGKLAD 420 

NRLFSEAF RFLLENPQVA+KIVEKGIIASKARIARKRAREVTRKKSGLEISNLPGKIiAD 
Sbjct: 361 NRLFSEAFQRFLLENPQVARKIVEKGILASKRRIAAKRAREVTRKKSGLEISNLPGKLiAD 420 



55 Query: 421 CSSNNAEMNELFI VEGDSAGGSAKSGRNREFQAILPIRGKIIjNVEKATMDKIIjANEEIRS 480 

CSSN+A NELFIVEGDSAGGSAKSGRNREFOAILPIRGKIIjNVEKATMDKIIiANEEIRS 
Sbjct: 421 CSSNDANQNELFIVBGDSAGGSAKSGRNREFQAILPIRGKILNVEKATMDKILANEEIRS 480 

Query: 481 LFTAMGTGFGADFDVSKATRYQKLVIMTDADVDGAHIRTLLLTLIYRFmPVLEAGYVYIA 540 
60 LFTAMGTGFGADFDVSK RYQKLVIMTDADVDGAH I RTLLLTLI YRFMRPVLEAGYVYI A 

Sbjct: 481 LFTAMGTGFGADFDVSKARYQKLVIMTDADVTX3AHIRTLLLTLIYRFMRPVLEAGYVYIA 540 

Query: 541 QPPIYGVTCVGSEIKAYIQPGVNQEEELRQALDTYSSGRSKPTVQRYKGLGEMDDHQLWET 600 
QPPIYGVKVGSEIK YIQPG++QE++L+ AL+ YS GRSKPTVQRYKGLGEMDDHQLWET 
65 Sbjct: 541 QPPIYGVKVGSEIKEYIQPGIDQEDQLKTALEKYSIGRSKPTVQRYKGLGEMDDHQLWET 600 
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Query: 601 TMDPENRLMARVSVDDAAEaDKIFDMLMGDRVEPRREFIEflNAVYSICLDI 650 

TMDPENRLMARV+VDDAAEADK+FDMLMGDRVEPRR+FIE NAVYS LDI 
Sbjct: 601 TMDPEmLMARVTVDDAAEADKVFDMLMGDR\7EPRRDFIEENAVySTLDI 650 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 407 

A DNA sequence (GBSx0442) was identified in S.agalactiae <SEQ ID 1319> which encodes the amino 
10 acid sequence <SEQ ID 1320>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 3186 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA91552 GB:Z67740 unidentified [Streptococcus pneumoniae] 

Identities = 82/142 (57%) , Positives = 105/142 (73%) 

Query: 45 LKESTADAIAYFIPEEADFLKEYKANEAKVLETPILFQGAKELLAKIQRQGSRNFLVSHR 104 
LK ST AI F P +FL++YK NEA+ LE PILF+G +LL I QG R+FLVSHR 
25 Sbjct: 2 LKVSTPFAIETFAPNLENFLEKYKENEARELEHPII.FEGVSDLLEDILNQGGRHFLVSHR 61 

Query: 105 DNQVIVILEKTEIIDYFTEWTADNGFSRKPSPESMLYLKEKYQIDNCLVIGDRDIDKQA 164 

++QV+ ILEKT I YFTEWT+ +GF RKP+PESMLYL+EKYQI + LVIGDR ID +A 
Sbjct: 62 NDQvlEILEKTSIAAYFTEVVTSSSGFKRKPNPESMLYLREKYQISSGLVIGDRPIDIEA 121 



30 



Query: 165 GESAGFDTLL VDGSKSLMEI IE 186 

G++AG DT L +L ++++ 

Sbjct: 122 GQAAGLDTHLFTS I VNLRQVLD 143 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 1321> which encodes the amino acid 
sequence <SEQ ID 1322>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 24 72 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below: 

Identities = 122/185 (65%) , Positives = 145/185 (77%) 

Query: 1 MNYHDYITOLGGTLLDNYESSTRAFVETLKEFGYCADHDSVYQKLKESTADAIAYFIPEE 60 
MNY DYIWDLGGTLLDNYE ST+AFV+TL F DHD+VYQKLKESTA A+A F P E 
50 Sbjct: 4 MNYQDYIWDLGGTLLDNYELSTQAFVQTIAFFSLPGDHI^VYQKLKESTAIAVAMFAPNE 63 

Query: 61 ADFLKEYKANEAKVLETPILFQGAKELLAKIQRQGSRNFLVSHRDNQVIVILEKTEIIDY 120 

+F)j Y+ EA L PI GAKE+L KI GSRNFL+SHRD QV +LE+ ++ Y 
Sbjct: 64 PEFLHVYRLREADKrAQPIWCLGAKEILGKIATSGSRNFLISHRDCQVNQLLEQAGLLIY 123 



55 



Query: 121 FTEWTADNGFSRKPSPESMLYLKEKYQIDNCLVIGDRDIDKQAGESAGFDTLLVDGSKS 180 

FTEWTA NGF+RKP+PES + YLKEKY I++ LVIGDR IDKQAG++AGF+TLLVDG K+ 
Sbjct: 124 FTEWTASNGFARKPNPESLFYLKEKYDINSGLVIGDRLIDKQAGQAAGFNTLLVDGRKN 183 
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Query: 181 LMEII 185 
L+EI+ 

Sbjct: 184 LLEIV 188 

5 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 408 

A DNA sequence (GBSx0443) was identified in S.agalactiae <SEQ ID 1323> which encodes the amino 
acid sequence <SEQ ID 1324>. This protein is predicted to be stage V sporulation protein E (rodA). 
10 Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9669> which encodes amino acid sequence <SEQ ID 9670> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15838 GB.-Z99123 alternate gene name: ipa-42d~similar to 
30 cell -division protein [Bacillus subtilis] 

Identities = 142/392 (36%) , Positives = 237/392 (60%) , Gaps = 23/392 (5%) 

Query: 10 QKSNYFKGQIDYAWIPVFFLLMIGLASIYVA-TiyiNDYPSNIYIAMFQQVSWIIMGCIIA 68 
Q+S +++G D + VFF+ I + SIY A Y + +1 QQ+ + ++G + 
35 Sbjct: 7 QQSPFYQG - - DLI FI FGVFF I - - 1 S WS I YAAGQFGQYGNTDWI QQIVFYLLGAVAI 59 

Query: 69 FWMLFSTEFLWKATPYLYALGLTLMVLPLIFYSPQLFAAT- -GAKNWVTIGSVTLFQPS 126 

V++ F E L K + Y++ +G+ +++ I SP+ A GAK+W IG +T+ QPS 
Sbjct: 60 TVLLYFDLEQLEKLSLYIFIIGILSLIILKI- -SPESIAPVIKGAKSWFRIGRITI-QPS 116 

40 

Query: 127 EFMKISYILMLSRITVSFHQKNRKTFQDDWKLL-GLFGLVTLPVMILLMLQKDLGTALVF 185 

EFMK+ I+ML+ + + K +T +DD LL + G+ +PV ++LM +D GTA + 
Sbjct: 117 EFMKVGLIMMLASVIGKANPKGVRTLRDDIHLLLKIAGVAVIPVGLILM--QDAGTAGIC 174 

45 Query: 186 LAILSGLILLSGISWWIILPILSTIVLFIASFLMIFISPNGKEWFYNLGMDTYQINRLSA 245 

+ 1+ ++ +SGI+W +1 I + +L 1+ L++ I N + ++G+ YQI R+++ 
Sbjct: 175 MFIVLvWFMSGINWKLIAIIAGSGILLISIjILLVMI--NFPDVAKSVGIQDYQIKRVTS 232 

Query: 246 WIDPFSFAD SIAYQQTQGMVSIGSGGVTGKGFNILELSVPVRESDMIFTVIAENFGF 302 

50 W+ + + ++Q Q +++IGSGG+ G G + L++ VP +D IF++I E+FGF 

Sbjct: 233 WVSASNETQEDSNDSWQVDQAIMAIGSGGILGNGISNLKVYVPESTTDFIFSIIGESFGF 292 

Query: 303 IGSAIVLGLYLI I IYRMLRIT- - IESNNQFYTFISTGFIMMIVFHVFENIGAAVGILPLT 360 
IG AIV+ ++ +IYR++ + I N+F +F G+ +IV H F+NIG +GI+P+T 
55 Sbjct: 293 IGCAIWIMFFFLIYRLVVLIDKIHPFNRFASFFCVGYTALIVIHTFQNIGMNIGIMPVT 352 

Query: 361 GIPLPFISQGGSSLLSNLIGIGLVLSMSYQNT 392 

GIPL F+S GGSS LS LIG G+V + S Q T 
Sbjct: 353 GIPLLFVSYGGSSTLSTLIGFGI VYNASVQLT 384 
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There is also homology to SEQ ID 1028. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 409 

A DNA sequence (GBSx0444) was identified in S.agalactiae <SEQ ID 1325> which encodes the amino 
acid sequence <SEQ ID 1326>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0. 3195 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1327> which encodes the amino acid 

sequence <SEQ ID 1328>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
20 >>> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 38/55 (69%) , Positives = 48/55 (87%) 

30 Query: 8 DEFKEAIDKGYISGNTVAIVRKNGKIFDYVLLHEETOEEEVVTVERVLDVLRKLS 62 

DEFK+AID GYI +G+TVAIVRK+G+ 1 FDYVL HE+V+ EWT E+V +VL +LS 
Sbjct,: 5 DEFKQAIDNGYIAGDWAIVRKDGQIFDYVLPHEKVKNGEVVTKEKVEEVLVELS 59 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 410 

A DNA sequence (GBSx0445) was identified in S.agalactiae <SEQ ID 1329> which encodes the amino 
acid sequence <SEQ ID 1330>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4241 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 133 1> which encodes the amino acid 
sequence <SEQ ID 1332>. Analysis of this protein sequence reveals the following: 
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Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0. 4551 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 57/66 (86%) , Positives = 63/66 (95%) 

Query: 1 MSQEKLKSKLDQAKGGAKEGFGKITGDKELEAKGFIEKTIAKGKELADDAKDAVEGRVDA 60 

MS+EKLKSK++QA GG KEG GK+TGDKELEAKGF+EKTIAKGKELADDAK+AVEGAVDA 
Sbjct: 1 MSEEKLKSKIEQASGGLKEGAGKLTGDKELEAKGFVEKTIAKGKELADDAKEAVEGAVDA 60 

15 

Query: 61 VKEKLK 66 

VKEKLK 
Sbjct: 61 VKEKLK 66 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 411 

A DNA sequence (GBSx0447) was identified in S.agalactiae <SEQ ID 1333> which encodes the amino 
acid sequence <SEQ ID 1334>. This protein is predicted to be TnpA (orfB). Analysis of this protein 
25 sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 3961 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9667> which encodes amino acid sequence <SEQ ID 9668> 
35 was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1335> which encodes the amino acid 
sequence <SEQ ID 1336>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0. 3365 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 152/160 (95%) , Positives = 154/160 (96%) 

Query: 1 MKNMALPKMATVTCTKTAL 60 
50 MKNMALPKMATVK KTALK+TQKTYPQNLLNQKFNPDKPNQVWSTDFTYI S IGYKKYVYL 

Sbjct: 194 MKNMALPKMATVKPKTALKRTQKTYPQNLLMQKETIPDKPNQWSTDFTYISIGYKKyVYL 253 

Query: 61 CAIIDLYSRKYIAWKLSHRMDAKIACDTLELAIjNKRKIEGTLLFHSDQGSQFKAREFRKI 120 
CAI+DLYSRK IAWKLSHRMDAKLACDTLELAIiNKRKIEGTLLFHSDQGSQFKARE RKI 
55 Sbjct: 254 CAILDLYSRKCIAWKLSHRMDAKLACDTLELALNKRKIEGTLLFHSDQGSQFKARELRKI 313 
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Query: 121 IDDNNIMHSFSKPRYPYDNAVTEAFFKYLKHRQINQKNYQ 160 

IDDN IMHSFSKP YPYDNAVTEAFFKYLKHRQINQK YQ 
Sbjct: 314 IDDNTIMHSFSKPGYPYDNAVTEAFFKYLKHRQINQKKYQ 353 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 412 

A DNA sequence (GBSx0448) was identified in S.agalactiae <SEQ ID 1337> which encodes the amino 
acid sequence <SEQ ID 1338>. Analysis of this protein sequence reveals the following: 

10 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1090 (Affirmative) < suco 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 413 

A DNA sequence (GBSx0449) was identified in S.agalactiae <SEQ ID 1339> which encodes the amino 
acid sequence <SEQ ID 1340>. This protein is predicted to be histidine kinase (resE). Analysis of this 
25 protein sequence reveals the following: 

Possible site: 40 

>■» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.57 Transmembrane 17 - 33 ( 6 - 38) 
INTEGRAL Likelihood = -4.67 Transmembrane 147 - 163 ( 142 - 166) 

30 

Final Results 

bacterial membrane Certainty=0 . 5628 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD25109 GB:AF140356 VncS [Streptococcus pneumoniae] 
Identities = 178/435 (40%), Positives = 281/435 (63%), Gaps = 1/435 (0%) 

40 Query: 1 MKKLKIFPKMFIQIFSILGILIILVHSLFFFIFPKTYLETRKVKIHIMADEISKNMNGKE 60 

MK+ +F K+FI FSI +L+I +H +F+FP TYL R+ I A I++++ GK+ 
Sbjct: 1 MKRTGLFAKIFIYTFSIFSVLVICLHLAIYFLFPSTYLSHRQETIGQKATAIAQSLEGKD 60 

Query: 61 LKYLDQTLELYSKSSDIKVFIKKNNNKNELQINDNINVNVKSDSNSLIIEEREIKLHDGK 120 
45 + ++Q L+LYS++SDIK +K +++L++ D++ ++ + SL IEERE+K DG 

Sbjct: 61 RQSIEQVLDLYSQTSDIKGTVKGEMTEDKLEVKDSLPLDTDRQTTSLFIEEREVKTQDGG 120 

Query: 121 KIHLQFVSTADMQKDAKDLSLKFLPYSLSISFLFSIVTSLIYAKSIKNNIQEITMVTDKM 180 
+ LQF+++ D+QK+A+ +SL+FLPY+L SFL S++++ IYA++I I EI VT +M 
50 Sbjct: 121 TMILQFLASMDLQKEAEQI SLQFLPYTLLASFLISLLVAYI YARTI VAPILE I KRVTRRM 180 

Query: 181 IKLDKETRLKISSNDEIGQLKQQINDLYQVLJjNTINDL^ 240 

+ LD + RL++ S DEIG LK+QIN LY LL I DL KN+ IL+LEK+K +F +GAS 
Sbjct: 181 MDLDSQWLRVDSKDEIGNLKEQINSLYQHLLWIADLHEKNEAILQLEKMKVEFLRGAS 240 



35 
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Query: 241 HELKTPLSSLKILLENMKMIGICTKDRDFYISECINIVD1&TKNVSQII.SFYSIKDLNND 300 

HELKTPL+SLKIL+ENM+ NIG+YKDRD Y+ + IVD L +V QILS S+++L +D 
Sbjct: 241 HELKTPLASLKILIENMRENIGRYKDRDQYLGVALGIVDELNHHVLQILSLSSVQELRDD 300 

5 

Query: 301 EEYLOTGDTLDEVLEKYSILVNQKKININKELI^YNIYIGKTAtMIVFSNLISNAVKYTN 360 

E +++ +++ Y++L ++++ 1+ L Y+ + + ++ SNLISNA+K++ 

Sbjct: 301 RETIDLLQMTQNLVKDYALIAKERELQIDNSLTHQQAYLNPSVMKLILSNL1SNAIKHSV 360 

10 Query: 361 RNGIINIKIANDWLLIENSYDKNKISKINKILDASFDLKLDNSNGLGLNIVKNILNKYNI 420 

G++ I L IENS + K+ + + K+ S G+GL +VK++L + 

Sbjct: 361 PGGLTOIGEREGELFIENSCSSEEQEKLAQSFSDNASRKVKGS-GMGLFVVKSLLEHEKL 419 

Query: 421 KYEILHGENYFI FKI 435 
15 Y EN F I 

Sbjct: 420 AYRFEMEENSLTFFI 434 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1341> which encodes the amino acid 
sequence <SEQ ID 1342>. Analysis of this protein sequence reveals the following: 

20 Possible site: 37 

»> Seems to have an uncleavable N-terra signal seq 

INTEGRAL Likelihood =-11.83 Transmembrane 14 - 30 ( 6 - 35) 
INTEGRAL Likelihood = -2.44 Transmembrane 157 - 173 ( 156 - 174) 

25 



30 



Final Results 

bacterial membrane Certainty=0. 5734 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD25109 GB:AF140356 VncS [Streptococcus pneumoniae]. 
Identities = 123/455 (27%) , Positives = 223/455 (48%) , Gaps = 23/455 (5%) 

35 Query: 3 LIKKTFLVINGLIIVVVTSILLVLYFAMPIYYTKVKDKEVKCEFDQTSKQIKGKTVTEIR 62 

L K F+ + V+V + L +YF P Y + + + + ++ ++GK I 
Sbjct: 6 LFAKIFIYTFSIFSVLVICLHLAIYFLFPSTYLSHRQETIGQKATAIAQSLEGKDRQSIE 65 

Query: 63 DILTKKINKDNIWYSLVDSDNQLLYPSLQLLDGVSESKDSQNVNIVTTFDNSYSNVKVMS 122 
40 +L +1 ++ ++ L++ D + D Q ++ + 

Sbjct: 66 QVLDLYSQTSDIKGTV KGEMTEDKLEVKDSLPLDTDRQTTSLF IEE 111 

Query: 123 QKVTLRDGKKMTLLGQSSLQPVTDASKVLLDLYPSLLIFSVTVGSIVAYLYSRTSSRRIL 182 
++V +DG , M L +S+ +A ++ L P L+ S + +VAY+Y+RT IL 
45 Sbjct: 112 REVKTQDGGTMILQFLASMDLQKEAEQISLQFLPYTLLASFLISLLVAYIYARTIVAPIL 171 

Query: 183 SMSQTAKKMVNLEPNLTCTIHGKDEIAMLASDINRLYASLSTSIKSLQKEYEKASDSERE 242 

+ + ++M++L+ + + KDEI L IN LY L T I L ++ E E+ 
Sbjct: 172 EIKRVTRRMMDLDSQTOLRVDSKTJEIGNLKEQINSLYQHLLTVIADLHEKNEAILQLEKM 231 

50 

Query: 243 KSEFLRMTSHELKTPITSVIGMIDGMLYNVGDFADRDKYLRKCRDVLEGQAQLVQSILSL 302 

K EFLR SHELKTP+ S+ +1+ M N+G + DRD+YL +++ V ILSL 

Sbjct: 232 KVEFLRGASHELKTPLASLKILIENMRENIGRYKDRDQYLGVALGIVDELNHHVLQILSL 291 

55 Query: 303 SKIETIASQNQELFSLKSSLEEEMEVFLVLSELKHLKOTINLEEQFVKANKVYLLKAIKN 362 

S ++ L ++E L + ++ + +L++ + L++ +L Q N + + N 
Sbjct: 292 SSVQEL-RDDRETIDLLQMTQNLVKDYALLAKERELQIDNSLTHQQAYLNPSVMKLILSN 350 

Query: 363 IIDNAFHYTKSGGQVMIQLKDNQLVIKNEAETIiLTQQQMKQLFQPFYRPDYSRNRKDGGT 422 
60 +1 NA ++ GG V I ++ +L I+N + ++ ++L Q F + +RK G+ 

Sbjct: 351 LISNAIKHSVPGGLVRIGEREGELFIENSC SSEEQEKLAQSF SDNASRKVKGS 403 

Query: 423 GLGLFITHQILDQHHLAYRFWLDQRWMVFTIDFP 457 
G+GLF+ +L+ LAYRF +++ + F IDFP 
65 Sbjct: 404 GMGLFWKSLLEHEKLAYRF - EMEENSLT F F I D FP 437 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 108/454 (23%) , Positives = 220/454 (47%) , Gaps = 22/454 (4%) 

5 Query: 4 LKIFPKMFIQIFSILGILIILWSLFFFIFPKTYLETRKVKIHIMADEISKNMNGKELKy 63 

+++ K F+ I ++ +++ + + +F P Y + + ++ D+ SK + GK + 
Sbjct: 1 TOLIKKTFLVINGLIIVWTSILLVLYFAMPIYYTKVKDKEVKCEFDQTSKQIKGKTVTE 60 

Query: 64 LDQTLELYSKSSDIKVFIKKNNNK NELQINDNINVNVKSDSN--SLII 109 

10 + L +1 + ++N+ +E + + N+N+ D++ ++ + 

Sbjct: 61 IRDILTKKINKDNIWYSLVDSDNQLLYPSLQLLDGVSESKDSQNVNI VTTFDNSYSNVKV 120 

Query: 110 EEREI KLHDGKKIHLQFVSTADMQKDAKDLSLKFLPYSLS I SFLFS I VI SLI YAKS I KNN 169 
+++ L DGKK+ L S+ DA + L P L S +++ +Y+++ 

15 Sbjct: 121 MSQKVTLRDGKKMTLLGQSSLQPVTDASKVLLDI.YPSLLIFSVTVGSIVAYLYSRTSSRR 180 

Query: 170 IQEITMVTDKMIKLDKETRLKISSNDEIGQLKQQINDLYCALLNTINDLEFKNKEILKLE 229 

I ++ KM+ L+ I DEI L IN LY +L +1 L+ + ++ E 

Sbjct: 181 ILSMSQTAKKMVNLEPNLTCTIHGKDEIMttASDINRLYASLSTSIKSLQKEYEKASDSE 240 

20 

Query: 230 K1KYDFFKGASHELKTPLSSLKILLENMKYNIGKYKDRDFYISECINIVDNLTKNVSQIL 289 

+ K +F + SHELKTP++S+ +++ M YN+G + DRD Y+ +C ++++ + V IL 
Sbjct: 241 REKSEFLRMTSHELKTPITSVIGMIDGMLYNVGDFADRDKYLRKCRDVLEGQAQLVQSIL 300 

25 Query: 290 SFYSIKDL-NNDEEYLNVGDTLDEVLEKYSILVNQKKININKELLDYNIYIGKTALNIVF 348 

S 1+ L + ++E ++ +L+E +E + +L K++ L++ KL 
Sbjct: 301 SLSKIETIASQNQELFSLKSSLEEEMEvFLVLSELKHLKVTINLEEQFVKANKVYLLKA.1 360 

Query: 349 SNLISNAVKYTNRNGIINIKIANDWLLIENSYDKNKISKINKILDASF DLKLDN 402 

30 N+I NA YT G + I++ ++ L+I+N + + K L F + D 

Sbjct: 361 KNIIDNAFHYTKSGGQVMIQLKDNQLVIKNEAETIi^ 420 

Query: 403 SNGLGLNIVKNILNKYNIKYE-ILHGENYFIFKI 435 
GLGL I IL+++++ Y ++ + + +F I 
35 Sbjct: 421 GTGLGLFITHQILDQHHLAYRFVVLDQRWMVFTI 454 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 414 

40 A DNA sequence (GBSx0450) was identified in S.agalactiae <SEQ ID 1343> which encodes the amino 
acid sequence <SEQ ID 1344>. This protein is predicted to be response regulator (regX3). Analysis of this 
protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -0.80 Transmembrane 50 - 66 ( 50 - 66) 

Final Results 

bacterial membrane Certainty=0. 1319 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9665> which encodes amino acid sequence <SEQ ID 9666> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

55 >GP:AAD25108 GB:AF140356 VncR [Streptococcus pneumoniae] 

Identities = 131/218 (60%) , Positives = 176/218 (80%) , Gaps = 1/218 (0%) 

Query: 5 MKILTVEDDi^IEEGISEYLSEFGYTVIQAKDGREALSKFNS-DINLVILDIQIPFINGI, 63 
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MKIL VED+++IREG+S+YL++ GY I+A DG+EAL +F+S ++ LV+LDIQ+P +NGL 
Sbjct: 1 MKILIVEDESWIREGVSDYLTDCGYETIEflADGQEALEQFSSYEVTUjVLLDIQMPKCiNGL 60 

Query: 64 EVLKEIRKKSNLPILILTAFSDEEYKIDAFTNLVDGYVEKPFSLPVLKARIDSLIKKNFG 123 
5 EVL EIRK S +P+L+LTAF DEEYK+ AF +L DGY+EKPFSL +LK R+D++ K+ + 

Sbjct: 61 EVLAEIRKTSQVPVLMLTAFQDEEYKMSAFASLADGYLEKPFSLSLLKVRVDAIFKRYYD 120 

Query: 124 HLEKFEYKNLSVNENSYTAKINDEKIDVMAKELEILKI^^ 183 
F YK+ V+F SY+A + +++ +NAKELEIL L+ N+G+ LTR QIID VWK + 
10 Sbjct: 121 TGRIFSYKDTKATOFESYSASLAGQEVPINAKELEILDYLVKNEGRALTRSQIIDAVWJCAT 180 

Query: 184 EEIPYDRWDVYIKELRKKLQLDCITTIRNVGYKLERK 221 

+E+P+DRV+DVYIKELRKKL LDCI T+RNVGYKLERK 
Sbjct: 181 DEVPFDRVIDVYI KELRKKLDLDC I LTVRNVGYKLERK 218 

15 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1345> which encodes the amino acid 
sequence <SEQ ID 1346>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>>> Seems to have no N- terminal signal sequence 
20 INTEGRAL Likelihood = -2.60 Transmembrane 48 - 64 ( 48 - 64) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF72358 GB.-AF192329 VanRB [Enterococcus faecalis] 
Identities = 88/215 (40%), Positives = 128/215 (58%), Gaps = 2/215 (0%) 

Query: 3 KILWEDDDTISQVICEFLKANNYDPDCVFDGQAALDKWQTTSYDLIILDIMLPSLSGLE 62 

+IL+VEDDD I + FL YD DG A K+ +Y L+ILDIMLP ++G E 
Sbjct: 4 RILLvEDDDHIOm^GFLAEAGYQVDACTDGNEAYTKFYENTYQLVILDIMLPGMNGHE 63 

35 Query: 63 VLKTIRKTSDVPIIMLTALDDEYTQLVSFNHLISDYVTKPFSPLILIKRIENVLRVSTPD 122 

+L+ R +D PI+M+TAL D+ Q+ +F+ DYVTKPF IL+KR+E +LR S 
Sbjct: 64 LLREFRAKNDTPILMMTALSDDENQIRAFDAEADDYVTKPFKMQILLKRVEALLRRSGAL 123 

Query: 123 EKR-QIGDLLVDETEHSVYWQGTLVKLTKKEYDIIDYIAKRHQKIVTRDQLMDDIWGYS- 180 
40 K ++G L + + +V GT + LT+KE++I+ L + + +T + ++ IWGY 

Sbjct: 124 AKEIRVGRLTLLPEDFTVLCDGTELPLTRKEFEILLLLVQNKGRTLTHEIILSRIWGYDF 183 

Query: 181 ELDTRVLDNHIKNLRKKMTGIPLKTITGMGYLLGE 215 
ED + HIKNLR K+ +KTI G+GY L E 
45 Sbjct: 184 EGDGSTVHTHIKNLRAKLPENIIKTIRGVGYRLEE 218 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/214 (37%) , Positives = 126/214 (58%) , Gaps = 4/214 (1%) 

50 Query: 6 KILTVEDDKLIREGISEYLSEFGYTVIQAKDGREALSKFNS-DINLVILDIQIPFINGLE 64 

KIL VEDD I + I E+L Y DG+ AL K+ + 4-L+ILDI +P ++GLE 

Sbjct: 3 KILVVEDDDTISQVICEFLKANNYDPDCVFDGCAALDKWQTTSYDLIILDIMLPSLSGLE 62 

' Query: 65 vLKEIRKKSNLPILILTAFSDEEYKIDAFTNLVTXSYvEKPFSLPVLKARIDSLIKKNFGH 124 
55 VLK IRK S++PI++LTA DE ++ +F +L+ YV KPFS +L RI+++++ + 

Sbjct: 63 VLKTIRKTSDVPIIMLTALDDEYTQLVSFNHLISDYVTKPFSPLILIKRIENVLRVSTPD 122 

Query: 125 LEKFEYKNLSVNFNSYTAKINDEKIDWAKELEILKmi£»NDGQVLTRMQIIDYVWKDSE 184 
EK + +L V+ ++ + + KE +1+ L +++TR Q++D +W SE 

60 Sbjct: 123 -EKRQIGDLLVDETEHSVYWC^TLVKLTKOTYDIIDYLAKRHQKIVTRDQLITODIWGYSE 181 

Query: 185 EIPYDRWDVYIKELRKKLQLDCITTIRNVGYKL 218 

RV+D +IK LRKK+ + TI +GY L 
Sbjct: 182 --LDTRVLDNHIKNLRKKMTGIPLKTITGMGYLL 213 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 415 

A DNA sequence (GBSx0451) was identified in S.agalactiae <SEQ ID 1347> which encodes the amino 
acid sequence <SEQ ID 1348>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

Possible site: 49 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.68 Transmembrane 423 - 439 ( 413 - 447) 

INTEGRAL Likelihood =-10.67 Transmembrane 16 - 32 ( 12 - 37) 

INTEGRAL Likelihood = -9.77 Transmembrane 303 - 319 ( 301 - 326) 

INTEGRAL Likelihood = -3.13 Transmembrane 343 - 359 ( 343 - 367) 



Final Results 

bacterial membrane Certainty=0 . 6074 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47594 GB:AF140784 Vexp3 [Streptococcus pneumoniae] 
Identities = 280/458 (61%) , Positives = 363/458 (79%) , Gaps = 3/458 (0%) 

MIKNAFAYVTRKSLKSLII1LVILSMATLSIISLSIKDATDRASKETFANITNSFSMEIN 60 
M+ NAFAYVTRK KS++I L+IL MA+LS++ LSIK AT +AS+ETF NITNSFSM+IN 
MLHNAFAYVTRKFFKS IVI FLI ILLMASLSLVGLSIKGATAKASQETFKNITNSFSMQIN 6 0 

RQVNPGTPRGGGNVKGEDIKKISQTNSIDSYVKRINSVADLTOHDIIETQOTIiANQSPER 12 0 
R+VN GTPRG GN+KGEDIKKI++ +I+SYVKRIN++ DL +D+IET +T N + +R 
RRVNQGTPRGAGNIKGEDIKKITENKAIESYVKRINAIGDLTGYDLIETPETKKNLTADR 120 



AK F ++M+TGVNDS+KE KFVS +YKLVEG+HL N DK+KIL+HKDLA K+ KVGDK 



+K+ SN++DADNEK A ETVEV IKGLFDGHN V+ +QELYENT ITD+H+AAK+YG 



TEDTA+Y DATFFV DKNLD V+K+L G INW+ Y L+KSSSNYPAL+QSISG+Y + 



+N LF GSL F+ ++++LLL LW+NAR+KE+ +LLS+G+ + I GQFI E + I+IPAL 



+ +YFLA YTA +GN +L VT +AKQ ++ + +S LGGGAE +GF+KTLS LDI++ 



Query: 


1 


Sbj ct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbj ct : 


241 


Query: 


300 


Sbjct: 


301 


Query: 


360 


Sbj ct : 


361 


Query: 


419 


Sbjct: 


421 



FII V+ + V+LV + L+S LRK PKELL+D 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1349> which encodes the amino acid 
sequence <SEQ ID 1350>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood =-12.90 Transmembrane 19 - 35 ( 16 - 43} 

INTEGRAL Likelihood = -7.27 Transmembrane 371 - 387 ( 359 - 392) 

INTEGRAL Likelihood = -7.01 Transmembrane 335 - 351 ( 326 -,357) 

INTEGRAL Likelihood = -6.21 Transmembrane 282 - 298 ( 276 - 308) 



Final Results 

bacterial membrane Certainty=0. 6158 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC24912 GB:AF012285 YknZ [Bacillus subtilis] 
Identities = 176/408 (43%) , Positives = 250/408 (61%) , Gaps = 16/408 (3%) 

15 Query: 1 MENWKFALSSIWGHKmSILTMLGIIIGVAAWIIMGLGNAMKNSVTSTFSSKQKDIQLY 60 

+EN + ALSS+ HKMRSILTMLGIIIGV +V++++ +G + + + S ++LY 
Sbjct: 4 LENIRMALSSVLAHKMRS ILTMLGI I IGVGSVIWVAVGQGGEQMLKQS I SGPGNTVELY 63 

Query: 61 FQEKGEE--EDLYAGLHTHFJ^NHEVKPEWLEQIVKDIDGIDSYYFTNSATSTISYEKKKV 118 
20 + EE + A + +++K +K I+GI + S + Y +++ 

Sbjct: 64 YMPSDEELASNPNAAAESTFTENDIKG LKGIEGIKQWASTSESMKARYHEEET 117 

Query: 119 DNASIIGVSKDYFNIKNYDIVAGRTLTDNDYSNFSRIILLDTVLADDLFGKGNYKSALNK 178 
D A++ G++ Y N+ + I +GRT TDND+ +R+ ++ +A +LF K S L + 
25 Sbjct: 118 D-ATVNGINDGYMNVNSLKIESGRTFTDNDFLAGNRVGIISQKMAKELFDK TSPLGE 173 

Query: 179 WSLSDKDYLVIGVYKTDQTPVSFDGLSGGAVMANTQVASEFGTKEIGSIYIHVNDIQNS 238 

W ++ + +IGV K +SFD LS V N + S FGT + ++ + V + 

Sbjct: 174 VVWINGQPVEIIGVLKKVTGLLSFD-LSEMYVPFM-MMKSSFGTSDFSNVSLQVESADDI 231 

30 

Query: 239 MNLGNQAADMLTNISHIKDGQYAVPDNSKIVEEINSQFSIMTTVIGSIAAISLLVGGIGV 298 

+ G +AA L N +H + Y V + +1 I +IMTT+IGSIA ISLLVGGIGV 
Sbjct: 232 KSAGKEAAQ-LVNDNHGTEDSYQVMTOffiEIAaGIGKVTAIMTTIIGSIAGISLLVGGIGV 290 

35 Query: 299 ^IMLVSVTERTREIGLRKALGATRLKILSQFLIESVVLTVLGGLIGLLLAQLSVGALGN 358 

MNIMLVSVTERTREIG+RK+LGATR +IL+QFLIESWLT++GGL+G+ + AL + 

Sbjct: 291 MNIMLVSVTERTREIGIRKSLGATRGQILTQFLIESWLTLIGGLVGIGIG-YGGAALVS 349 

Query: 359 AMTLKGACISLDVALIAVLFSASIGVFFGMLPANKASKLDPIEALRYE 406 
40 A+ + IS V VLFS IGV FGMLPANKA+KLDPIEALRYE 

Sbjct: 350 AIAGWPSLISWQWCGGVLFSMLIGVIFGMDPANKAAKLDPIEALRYE 397 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 56/247 (22%) , Positives = 101/247 (40%) , Gaps = 42/247 (17%) 

45 

Query: 147 YKLVEGKHLENKDKNKI LMHKDLAKKNNLK VGDKIKIKSNLFDA 190 

Y +V G+ L + D + ++ DL K N K + DK + ++ 

Sbjct: 136 YDIVAGRTLTDNDYSNFSRIILLDTVLADDLFGKGNYKSALNKVVSLSDKDYLVIGVYKT 195 

50 Query: 191 DNEKVANETVEVEIKGLFDGHNSGGVSAAQELYENTLITDVHSAAKVYGNTEDTAVYQDA 250 
D V+ FDG + GVA NT + A+GE ++Y 

Sbjct: 196 DQTPVS FDGLSGGAVMA NTQV ASEFGTKEIGSIYIHV 232 

Query: 251 TFFVKGDKNLDSVIKDL--GKLDINWREYNLIKSSSNYPALQQSISGIYSISNKLFVGSL 308 
55 ++ NL + D+ I +Y + +S + S + ++ + SL 

Sbjct: 233 ND - 1 QNSMNLGNQAADMLTNISHI KDGQYAVPDNSKIVEEINSQFS IMTTVIGSIAAISL 291 

Query: 309 IFAGVWSLLLFLWMNARKKEIAVLLSLGISKLEIFGQFIIEMVFISIPALLGSYFLAQY 368 
+ G+ V ++ + + R +EI + +LG ++L+I QF+IE V +++ L LAQ 
60 Sbjct: 292 LVGGIGVMNIMLVSVTERTREIGLRKALGATRLKILSQFLIESVVLTVLGGLIGLLLAQL 351 

Query: 369 TADKLGN 375 
+ LGN 



65 



Sbjct: 352 SVGALGN 358 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 416 

A DNA sequence (GBSx0452) was identified in S.agalactiae <SEQ ID 1351> which encodes the amino 
5 acid sequence <SEQ ID 1352>. This protein is predicted to be Vexp2 (b0879). Analysis of this protein 
sequence reveals the following: 
Possible site: 16 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 3194 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47593 GB:AF140784 Vexp2 [Streptococcus pneumoniae] 
Identities = 142/207 (68%) , Positives = 169/207 (81%) 

Query: 1 MDILEIKNVNYSYANSKEKVLSGVNQKFELGKFYAIVGKSGTGKSTLLSLLAGLDKVQTG 60 
20 M +L++++V Y Y N+ E VL +N EE GKFY+I+G+SG GKSTLLSLLAGLD G 

Sbjct: 1 MTLLQLQDVTYRYKNTAEAVLYQINYNFEPGKFYSIIGESGAGKSTLLSLLAGLDSPVEG 60 

Query: 61 KILFKNEDIEKKGYSNHRKNNISLVFQNYNLIDYLSPIENIRLVNKSVDESILFELGLDK 120 
ILF+ EDI KKGYS HR ++ISLVFQNYNLIDYLSP+ENIRLVNK ++ L ELGLD+ 
25 Sbjct: 61 SILFCflEDIRKKGYSYHRMHHISLWQimiLIDYLSPLENIRLWKKASKNTLLELGLDE 120 

Query: 121 KQI KRNVMKLSGGQQQRVAIARALVSDAPI ILADEPTGNLDSVTAGE I INI LKELAQDRN 180 

QIKRNV++LSGGQQQRVAIAR+LVS+AP+ILADEPTGNLD TAG+I+ +LK LAQ 
Sbjct: 121 SQIKRNVI£LSG(KX»RVAIARSLVSEAPWIA^ 180 

Query: 181 KCVIWTHSKEVADSADI ILELSGKKL 207 

KCVIWTHSKEVA ++DI LEL KKL 
Sbjct: 181 KCVIWTHSKEVAQASDITLELKDKKL 207 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 1353> which encodes the amino acid 
sequence <SEQ ID 1354>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .2717 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below: 

Identities = 83/230 (36%) , Positives = 135/230 (58%) , Gaps = 13/230 (5%) 

Query: 1 MDILEIKNVNYSYANSKEKVLSGVNQKFEL - - GKFYAIVGKSGTGKSTLLSLLAGLDKVQ 58 
M +E+K V+ SY + V + FE+ G+ I+G SG GKST+L++L G+D V 

50 Sbjct: 5 MAFIELKQVSKSYQIGETTVFANHEVSFEINKGELVVILGASGAGKSTVLNILGGMDTVD 64 

Query: 59 TGKILFKNEDIE KKGYSNHRKNNISLVFQNYNLIDYLSPIENIRLVNKSVDES 111 

G+++ +DI K + +R+N I VFQ YNL+ L+ EN+ L + V ++ 

Sbjct: 65 AGQVIIDGKDI7AHYTSKALTQYRRNAIGFvEQFYNLOTNLTAKENVEIAVEIVADALDPV 124 



30 



55 



Query: 112 -ILFELGLDKKQIKKNVMIQjSGGQX3QRVAIARALVSDAPIILADEPTGNLDSVTAGEIIN 170 

IL E+GL + + +LSGG+QQRV+ IARAL + ++L DEPTG LD T +1+ 

Sbjct: 125 TILKEVGLSHR-LDHFPAQLSGGEQQRVSIARALAKNPKLLLCDEPTGALDYQTGKQILT 183 
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Query: 171 ILKELAQDRNKCVIWTHSKEVADSADIILELSGKKLKK--VNKMNLEVE 218 

+L+++AQ + V++VTH+ +A AD ++ + ++ K +NK +E 
Sbjct: 184 LLQDMAQTKGTTWI VTHNAAIAP IADRVI FMHDAQVTKTVINKEPAS IE 233 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 417 

A DNA sequence (GBSx0453) was identified in S.agalactiae <SEQ ID 1355> which encodes the amino 
10 acid sequence <SEQ ID 1356>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.35 Transmembrane 17 - 33 ( 17 - 34) 

15 Final Results 

bacterial membrane Certainty=0. 2338 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 418 

25 A DNA sequence (GBSx0454) was identified in S.agalactiae <SEQ ID 1357> which encodes the amino 
acid sequence <SEQ ID 1358>. This protein is predicted to be Vexpl. Analysis of this protein sequence 
reveals the following: 

Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 

30 INTEGRAL Likelihood =-11.52 Transmembrane 294 - 310 ( 285 - 312) 

INTEGRAL Likelihood =-10.67 Transmembrane 396 - 412 ( 385 - 417) 

INTEGRAL Likelihood = -8.76 Transmembrane 17 - 33 ( 14 - 38) 

INTEGRAL Likelihood = -4.14 Transmembrane 335 - 351 ( 333 - 357) 

35 Final Results 

bacterial membrane Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47592 GB : AF140784 Vexpl [Streptococcus pneumoniae] 
Identities = 165/425 (38%) , Positives = 271/425 (62%) , Gaps = 4/425 (0%) 

Query: 2 IKNAIAYITRKKNRTLIIFAILTIVLSCLYSCLTIMKSSNEIEKALYESSNSSISITK-K 60 
45 1+ + AY++RK+ R+ I+F IL ++L+ + +CLT+MKS+ +E LY+S N+S SI K + 

Sbjct: 4 IQRSWAYVSRKRLRSFILFLILLVLLAGISACLTIiMKSNKTVESNLYKSLNTSFSIKKIE 63 

Query: 61 DGKYFNINQFKNIEKIKEVEEKIFQYDGLAKLKDLKWSGEQSINREDLSDEFKNWSLE 120 
+G+ F ++ ++ KIK +E + + +AKLKD + V+GEQS+ R+DLS N+VSL 
50 Sbjct: 64 NGQTFKLSDLASVSKIKGLENVSPELETVAKLKDKEAVTGEQSVERDDLSAADNNLVSLT 123 

Query: 121 ATSNTKRNLLFSSGVFSFKEGKNIEENDKNSILVHEEFAKQNKLKLGDEIDLELLDTEKS 180 

A ++ +++ F+S F+ KEG+++++ D IL+HEE AK+N L L D+I L+ +E S 
Sbjct: 124 ALEDSSKDVTFTSSAFNLKEGRHLQKGDSKKILIHEELAKKNGLSLHDKIGLDAGQSE-S 182 
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Query: 181 GKIKSHKFKIIGIFSGKKQETYTGLSSDFSENMVFVDYSTSQEILNKSENNRIANKILMY 240 

GK ++ +F+IIGIFSGKKQE +TGLSSDFSEN VF DY +SQ +L SE A + Y 
Sbjct: 183 GKGQTVEFEIIGIFSGKKQEKFTGLSSDFSENQVFTDYESSQTLLGNSEAQVSAARF--Y 240 

5 

Query: 241 SGSLESTELAIiNKIjKDFKIDKSKYSIKKDNKAFEESLESVSGIKHIIKIMTYSIMLGGIV 300 

+ + + + ++++ ++ Y ++K+NKAFE+ +SV+ + + I Y +++ G 
Sbjct: 241 VENPKEMDGLMKQVENLALENQGYQVEKENKAFEQIKDSVATFQTFLTIFLYGMLIAGAG 300 

10 Query: 301 VLSLILILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFISIPSIISSLFLGNLLLKVI 360 

L L+L LWLRER+YE+GI L++G K I QF E++ +S+ +++ + GN + + 
Sbjct: 301 ALILVLSLWLRERVYEVGILLALGKGKSSIFLQFCLEWLVSLGALLPAFVAGNAITTYL 360 

Query: 361 VEGFINSENSMIFGGSLINKSSFMLNITTLAESYLILISIIVLSWMASSLILFKKPKEI 420 
15 ++ + S + +L SS +1 + AESY+ L+ + LSV + + K PKEI 

Sbjct: 361 LQTLLASGDQASLQDTLAKASSLSTSILSFAESYVFLVLLSCLSVALCFLFLFRKSPKEI 420 

Query: 421 LSKIS 425 
LS IS 

20 Sbjct: 421 LSSIS 425 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1359> which encodes the amino acid 

sequence <SEQ ID 1360>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
25 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.57 Transmembrane 23 - 39 ( 16 - 43) 
INTEGRAL Likelihood =-11.36 Transmembrane 371 - 387 ( 362 - 396) 
INTEGRAL Likelihood = -8.12 Transmembrane 331 - 347 ( 324 - 360) 
INTEGRAL Likelihood = -7.70 Transmembrane 280 - 296 ( 277 - 308) 

30 

Final Results 

bacterial membrane Certainty=0. 5628 (Affirmative) < suco 

bacterial outside — Certainty= 0.0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB97962 GB:U96166 ATP-binding cassette transporter-like protein 
[Streptococcus cristatus] 
Identities = 222/311 (71%) , Positives = 278/311 (89%) 

Query: 16 MRSILTMLGIIIGIGAI IAIFSIIEGNTENTKRQLIGGSNNTINIVFNKKSSIDPKFPDK 75 

MRS+LTMLGI I IGIGAI IAIFSI IEGNTENTKRQLIGGSNNTI +V++KKS+IDP P+K 
Sbjct: 1 MRSMLTMLGIIIGIGAIIAIFSIIEGNTENTKRQLIGGSNNTIKWYDKKSAIDPSIPEK 60 

45 Query: 76 SNAKKPDYLPFMAEEELSKIQQVKGVKNALISYGIDDKVYHLGQKSSAKISAITKNVAEV 135 

S A+KP Y+PFM E+ LSKI+++ GVKNAL++YG D+K+Y+L QKSS+K+ A++++VA++ 
Sbjct: 61 SQAQKPSYIPFMGEDVLSKIKEIPGVKNALMTYGADEKIYYLSQKSSSKVQAVSQSVADI 120 

Query: 136 RRMTFIKGSDFSDKDFIDQKQVIYLEKSLYESLFPKDDGLGKFVEVMGNPFRVIGVFESK 195 
50 ++ ++G F + F +Q+QV YLEKSLY++LFPK DG+GK+VEV GNPF+VIGVFES 

Sbjct: 121 KQQRLLEGEGFDSEAFKNQEQVAYLEKSLYDTLFPKGDGIGKYVEVKGNPFKVIGVFEST 180 

Query: 196 EQSGLTSGTEKIAYIPLHQWYNINGWDATPEITIQTYRADDLKPVAKRVSDMLNQTIPK 255 
EQSGLTSG+EK+AYIPL QW+ I ++ +PE+T+QT++ADDLK VAK+VSD LNQ +P+ 
55 Sbjct: 181 EQSGLTSGSEKVAYIPLQQWHRIFDTIWSPEVWQTHKADDLKKVAKKVSDYLNQQMPQ 240 

Query: 256 SDYMFGVT^KEFERQLDNLNKSNFVLLAGIASISLIVGGIGVMNIMLVSVTERTREIGI 315 

SDYMFGV+NL+EFERQLDNLN+SNFVLLAGIASISL+VGGIGVMNIMLVSVTERTREIGI 
Sbjct: 241 SDYMFGVLNLQEFERQLDNLNQSNFVLLAGIASISLLVGGIGVMNIMLVSVTERTREIGI 300 



35 



40 



60 



Query: 316 KKALGARRKLI 326 

KKALGARRK++ 
Sbjct: 301 KKALGARRKIL 311 



65 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 79/386 (20%) , Positives = 170/386 (43%) , Gaps = 38/386 (9%) 

Query: 5 AIAYITRKKmTLIIPAILTIvLSCLYSCLTIMKSSNE-IEKALYESSNSSISITKKDGK 63 

A++ I K R+++ + I + + + + E ++ L SN++I+I 
Sbjct: 7 ALSSILSHKMRSILTMLGIIIGIGAIIAIFSIIEGNTENTKRQLIGGSNNTINIV 61 

Query: 64 YFNINQFKNIEKIKEVEEKIFQYDGLAKLKDLKVVSGEQSINREDLSDEPKNVVSLEATS 123 

FN K ++ K F AK D E+ +++ KN + 

Sbjct: 62 -FN KKSSIDPK-FPDKSNAKKPDYLPFMAEEELSKIQQVKGVKNALISYGID 111 

Query: 124 NTKKMLLFSSGVFSFKEGKNIEENDKNSILVHEEFAKQNKLKLGDEIDLELLDTE 178 

+ +L S KN+ E + + + +F+ ++ + I LE E 

Sbjct: 112 DKVYHLGQKSSAKISAITKNVAEVRRMTFIKGSDFSDKDFIDQKQVIYLEKSLYESLFPK 171 

Query: 179 KSGKI KSHKFKI IGI FSGKKQET YTGLSSDFSENMVFVDYSTSQE I IiNKSENNRI 233 

K ++ + F++IG+F K+Q +GL+S +E + ++ I + 

Sbjct: 172 DDGLGKFVEVMGNPFRVIGVFESKEQ SGLTSG - TEKIAYI PLHQWYNINGWDATPE 227 

Query: 234 ANKILMSGSLESTEIALNKLKDFKIDKSKYSIKKDN-KAFEESLESVSGIKHIIK--IM 290 

+ L+ ++ + + I KS Y N K FE L++++ ++ I 

Sbjct: 228 ITIQTYRADDLKPVAKRVSDMINQTIPKSDYMFGVMNLKEFERQLDNLNKSNFVLLAGIA 287 

Query: 291 TYSIMLGGIWLSLILILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFIS IPSI 346 

+ S+++GGI V++++L+ + ER EIGI ++G + I++QF+ E + ++ + + 
Sbjct: 288 SISLIVGGIGVMNIMLVS- VTERTREIGIKKALGARRKLILKQFLIEAVILTLLGGVIGV 346 

Query: 347 ISSLFLGNLLLKVIVEGFINSENSMI 372 

IS + G ++ + + +1 S S++ 
Sbjct: 347 ISGMVSGLI ITRSLEYPYILSLFSW 372 

A related GBS gene <SEQ ID 8571> and protein <SEQ ID 8572> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 

McG: Discrim Score: 5.59 

GvH: Signal Score (-7.5): -5.97 
Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 4 value: -11.52 threshold: 0.0 

INTEGRAL Likelihood =-11.52 Transmembrane 294 - 310 ( 285 - 312) 
INTEGRAL Likelihood =-10.67 Transmembrane 396 - 412 ( 385 - 417) 
INTEGRAL Likelihood = -8.76 Transmembrane 17 - 33 ( 14 - 38) 
INTEGRAL Likelihood = -4.14 Transmembrane 335 - 351 ( 333 - 357) 
PERIPHERAL Likelihood = 4.51 315 
modified ALOM score: 2.80 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

38.7/67.3% over 421aa 

Streptococcus 

pneumoniae 

GP| 5712667 1 Vexpl Insert characterized 

ORF00815(304 - 1575 of 1875) 

GP| 5712667 |gb|AAD47592.l|AF140784_l]AF140784 (4 - 425 of 425) Vexpl {streptococcus 

pneumoniae} 

%Match =25.0 

%Identity =38.7 %Similarity =67.2 

Matches = 164 Mismatches = 136 Conservative Sub.s = 121 
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48 78 108 138 168 198 228 258 

SIEH*WFDNKTI*T*ELDFVSHSS**VI*DFPLNK*IRNSVTSYINGSIIEIVCQMKKF*WK*F*KH*L*AM*KY*SSG 

288 318 348 378 408 438 468 495 

5 OTSCGVKIERSN*EVIKNAIAYITRKKlffiTIiIIFAILTIVLSCLYSCLTI^SSNEIEKaLYESSNSSISITK-KDGKyF 

|: = ||::||= | = :| = | II : = l= = Hlhllh =1 11 = 1 hi II I = = 1 = I 
MNPIQRSWAYVSRKRLRSFILFLILLVLI1AGISACI1TLMKSNKTVESNLYKSLNTSFSIKKIENGQTF 

10 20 30 40 50 60 

10 525 555 585 615 645 675 705 735 

NINQFKNIEKIKEVEEKIFQYDGI^KIjKDLKWSGEQSINREDLSDEFKNWSLEATSNTKRNLLFSSGVFSFKEGKNIE 

:: : :: ||| :| : : :||||| = hlllh hill hill I = = = = = 1 = 1 h = llh = = = 

KLSDLASVSKIKGLEWSPELEOTAKLKDKEAWGEQSVERDDLSAM3IMLVSLTALEDSSKDVTFTSSAFNLKEGRHLQ 

80 90 100 110 120 130 140 



15 



765 795 825 855 885 915 945 975 

EISTOKNSILVHEEFAKQNKLKLGDEIDLELIjDTEKSGKIKSHKFKIIGIFSGKKQETYTGLSSDFSENMVFVDYSTSQEIL 



KGDSKKILIHEELAKKNGLSLHDKIGLDAGQSE-SGKGQTVEFEIIGIFSGKKQEKFTGLSSDFSENQVFTDYESSQTLL 
20 160 170 180 190 200 210 220 

1005 1035 1065 1095 1125 1155 1185 1215 

NKSE^M^IANKILIC^SGSLESTELALNKLKDFKIDKSKYSIKKDNKAFEESLESVSGIKHIIKIMTYSIMLGGIVVLSLI 

|| : | : : : : ::::= :: I -hllllh =lh = = I I = = = I 11 = 

25 GNSEA- -QVSAARFYVFJ^PKEMDGLMKQVENLALENQGYQVEKENKAFEQIKDSVATFQTFLTIFLYGMLIAGAGALILV 

240 250 260 270 280 290 300 

1245 1275 1305 1335 1365 1395 1425 1455 

LILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFISIPSIISSLFLGNLLLKVIVEGFINSENSMIFGGSLINKSSFML 

30 | ||||||:|hlhh = l I I II =l = = = = l= ::= == II = : = : = = I = = =1 11 = 

LSLWLRERVYEVGILLALGKGKBSIFLQFCLEWLVSLGALLPAFTO 

320 330 340 350 360 370 380 

1485 1515 1545 1575 1605 1635 1665 1695 

35 NITTLAESYLILISIIVLSVVI^SSLIDFKKPKEILSKIS*EQI^ILEIKNVOTSYANSKEKA^SGVNQKFELGKFYA1 

' =| ::||||= h = III = === I llllll II 
SILSFAESYVFLVLLSCLSVALCFLFLFRKSPKEILSSIS 

400 410 420 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 419 

A DNA sequence (GBSx0455) was identified in S.agalactiae <SEQ ID 1361> which encodes the amino 
acid sequence <SEQ ID 1362>. Analysis of this protein sequence reveals the following: 

45 Possible site: 42 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.04 Transmembrane 19 - 35 { 14 - 42) 

Final Results 

50 bacterial membrane Certainty=0. 3017 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

55 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 420 

A DNA sequence (GBSx0456) was identified in S.agalactiae <SEQ ID 1363> which encodes the amino 
acid sequence <SEQ ID 1364>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
5 >» Seems to have an uncleavable N-term signal seg 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 421 

A DNA sequence (GBSx0457) was identified in S.agalactiae <SEQ ID 1365> which encodes the amino 
acid sequence <SEQ ID 1366>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
20 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < succ> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA74029 GB:U30715 ORFB [Bacillus anthracis] 
Identities = 33/76 (43%) , Positives = 44/76 (57%) , Gaps = 1/76 (1%) 

30 

Query: 11 IRRVSHACTKAGDRFYEENIIiNREFTATAHNQKWCTDVTYLQYGLG/AKAYLSAIKDLYNG 70 

++R R EN+LNR F A N+KW TD+TYL +G YL +1 DLYN 

Sbjct: 86 VKRKRRTWINGESRIVVENLLJsIRNFQANKPNEKWOTDITYLPFGT-EMLYLLSIMDLYNN 144 

35 Query: 71 SIIAYEISHNNEIHLL 86 

IIAYEIS+ ++ L+ 
Sbjct: 145 EIIAYEISNRQDVTLV 160 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 422 

A DNA sequence (GBSx0458) was identified in S.agalactiae <SEQ ID 1367> which encodes the amino 
acid sequence <SEQ ID 1368>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
45 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 10 - 26 ( 10 - 26) 

Final Results 

bacterial membrane Certainty=0. 1277 (Affirmative) < suco 

50 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 423 

A DNA sequence (GBSx0459) was identified in S.agalactiae <SEQ ID 1369> which encodes the amino 
acid sequence <SEQ ID 1370>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0. 4170 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA56999 GB:U09558 ORFA, putative Helix-Turn-Helix motif from 
amino acid 21 through 42 and from amino acid 78 through 
99 [Lactobacillus johnsonii] 
20 , Identities = 28/116 (24%) , Positives = 59/116 (50%) , Gaps = 6/116 (5%) 

Query: 3 YSTLAKEQGVQGYLDGKGSLRD I CKWYDI SSRSVLQKWI KRYTSGEDLKATSRGYSRMKQ 62 

YST K + V YL+ + S++ + K Y+I +++++W+ + + L A S +++ 
Sbjct: 4 YSTELKIEIVSKYLNHEDSIKGLAKQYNIHW-TLIRRWVDK-AKCQGLAALSVKHTKTTY 61 

25 

Query: 63 GRQATFEERVEIVNYTIAHGKDYQAAIEKFGVSYQQIYSWVRKLEKNGSQGLVDRR 118 

+ ++ +V Y + H KF +S Q+Y+W +K + G GL+ ++ 

Sbjct: 62 SS DFKLNVVRYYLTHSIGVSKVAAKFNISDSQVYNWAKKFNEEGYAGLLPKQ 113 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 424 

A DNA sequence (GBSx0460) was identified in S.agalactiae <SEQ ID 1371> which encodes the amino 
35 acid sequence <SEQ ID 1372>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -0.69 Transmembrane 2 - 18 ( 2 - 19) 

40 Final Results 

bacterial membrane Certainty=0. 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 425 

A DNA sequence (GBSx0461) was identified in S.agalactiae <SEQ ID 1373> which encodes the amino 
acid sequence <SEQ ID 1374>. This protein is predicted to be integrase (phage-relatedpr). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 28 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC79517 GB:U88974 0RF1 [Streptococcus thermophilus temperate 
15 bacteriophage O1205] 

Identities = 104/172 (60%) , Positives = 127/172 (73%) , Gaps = 11/172 (6%) 

Query: 10 QHQSYAALYLI AKTGMRFAECLGLTVND I DYTNKYLS INKTWDYHFNQRYLPTKNKSS IR 69 
++ SYAALY+ I +KTG+RFAECLGLTV+DI LS+NKTWDY N ++PTK KSSIR 

20 Sbjct: 186 EYASYJU^YIlSKTGIRFi^CLGLTvDDIKRDTG^^LSWKTWDYKNNTGFMPTKTKSSIR 245 

Query: 70 NIPIDNDTLFFLHEFTKNKNDRLFDKLSNNAVNKTIRKITGREVRVHSLRHTFASY 125 

IP+D++ + F+ + + RL LSNNAVNKT+RKI GREVRVHSLRHT+ASY 

Sbjct: 246 EIPLDDEFINFIDQLPPTDDGRLLPSLSNNAVNJCTLRKIVGREVRVHSLRHTYASYLIAH 305 



25 



Query: 126 LISISQVLDHENLNITLEVYAHQLQEQKDRNDKLNQRNLGRIWGKIALN 174 

LIS+SQVL HENLNITLEVYAHQLQEQK RND+ + ++W K N 

Sbjct: 306 DIDLISVSQVLGHENLNITLEVYAHQLQEQKSRNDE KIKQMWTKCGQN 353 



30 There is also homology to SEQ ID 578 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 426 

A DNA sequence (GBSx0462) was identified in S.agalactiae <SEQ ID 1375> which encodes the amino 
35 acid sequence <SEQ ID 1376>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 3206 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
45 homology to SEQ ID 1328. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 427 

A DNA sequence (GBSx0463) was identified in S.agalactiae <SEQ ID 1377> which encodes the amino 
50 acid sequence <SEQ ID 1378>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
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>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6542 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB52541 GB:AJ131519 hypothetical protein [Lactobacillus 
10 bacteriophage phi adh] 

Identities = 24/55 (43%) , Positives = 36/55 (64%) 

Query: 12 MDKELTPQEKANKKWAENNREHRTYLSKRSTARSFINKNATKEDLLELKQLIESK 66 
M K + KANKKW E N+ + Y++KRSTA+SFI AT+EDL +++ + + 
15 Sbjct: 1 ^KITEARAKANKKWDEKNKARKLYINKRSTAKSFILNIATEEDLANIEEYVAER 55 

No correspondhig DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

20 Example 428 

A DNA sequence (GBSx0464) was identified in S.agalactiae <SEQ ID 1379> which encodes the amino 
acid sequence <SEQ ID 1380>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm — Certainty=0 .4417 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 1332. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 429 

A DNA sequence (GBSx0465) was identified in S.agalactiae <SEQ ID 1381> which encodes the amino 
acid sequence <SEQ ID 1382>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

40 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty==0. 0000 (Not Clear) < suco 

45 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-531- 

Example 430 

A DNA sequence (GBSx0466) was identified in S.agalactiae <SEQ ID 1383> which encodes the amino 
acid sequence <SEQ ID 1384>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
5 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.30 Transmembrane 205 - 221 ( 202 - 223) 
INTEGRAL Likelihood = -3.56 Transmembrane 296 - 312 ( 294 - 312) 

Final Results 

10 bacterial membrane Certainty=0. 2720 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9663> which encodes amino acid sequence <SEQ ID 9664> 
15 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8573> and protein <SEQ ID 8574> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: -8.80 
GvH: Signal Score (-7.5): -4.03 

Possible site: 47 
>» Seems to have no N-terminal signal sequence 
25 ALOM program count: 2 value: -4.30 threshold: 0.0 

INTEGRAL Likelihood = -4.30 Transmembrane 205 - 221 ( 202 - 223) 
INTEGRAL Likelihood = -3.56 Transmembrane 296 - 312 ( 294 - 312) 
PERIPHERAL Likelihood =2.97 20 
modified ALOM score: 1.36 



30 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 2720 (Affirmative) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8574 (GBS366) was expressed in E.coli as a GST-fusion product. The purified fusion protein 
(Figure 215, lane 5) was used to immunise mice. The resulting antiserum was used for FACS (Figure 281), 
40 which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 431 

A DNA sequence (GBSx0467) was identified in S.agalactiae <SEQ ID 1385> which encodes the amino 
45 acid sequence <SEQ ID 1386>. This protein is predicted to be N-acetylmuramoyl-L-alanine amidase. 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

50 , Final Results 

bacterial cytoplasm Certainty=0. 1471 (Affirmative) < suco 
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10 



bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8575> which encodes amino acid sequence <SEQ ID 8576> 
was also identified. This has an RGD motif at residues 81-83. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB07986 GB:Z93946 N-acetylmuramoyl-L-alanine amidase 
[bacteriophage Dp-1] 
Identities = 99/140 (70%) , Positives = 120/140 (85%) 

Query: 10 ^WINIEQAIAWMASRKGKVTYSMDYRNGPSSYDCSSSVYFALRSAGASDNGWAVNTEYEH 69 

M ++IE+ +AWM +RKG+V+YSMD+R+GP SYDCSSS+Y+ALRSAGAS GWAVNTEY H 
Sbjct: 1 MGVDIEKGVAWMQARKGRVSYSMDFRDGPDSYDCSSSMYYALRSAGASSAGWAVNTEYMH 60 

15 Query: 70 DWLIKNGYVLIAENTNWNAQRGDIFIWGKRGASAGAFGHTGMFVDPDNI1HCNYGYNSIT 129 

WLI+NGY LI+EN W+A+RGDIFIWG++GASAGA GHTGMF+D DNIIHCNY Y+ 1+ 
Sbjct: 61 AWLIENGYELISENAPWDAKRGDIFIWGRKGASAGAGGHTGMFIDSDNIIHCNYAYDGIS 120 

Query: 130 VNNHDEIWGYNGQPYVYAYR 149 
20 VN+HDE W Y GQPY Y YR 

Sbjct: 121 VNDHDERWYYAGQPYYYVYR 140 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1387> which encodes the amino acid 
sequence <SEQ ID 1388>. Analysis of this protein sequence reveals the following: 

25 Possible site: 26 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 79 - 95 ( 77 - 95) 

30 Final Results 

bacterial membrane Certainty=0. 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certalnty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below: 

Identities = 56/91 (61%) , Positives = 68/91 (74%) 

Query: 158 KVDNQSWSKFEKELDVNTPLSNSNMPYYEATISEDYYVESKPDVNSTDKELLVAGTRVR 217 
K+D F ++LD NT L NSN+PYYEAT+ DYYVESKP+ +S DKE + AGTRVR 

40 Sbjct: 354 KIDKPQSQLTFNQKLDTNTKLDNSNVPYYEATLRTDYYVESKPNASSADKEFIKAGTRVR 413 

Query: 218 VYEKVKGWARIGAPQSNQWVEDAYLIDATDM 248 

VYEKV GW+RI A QS+QWVED YL +AT + 
Sbjct: 414 VYEKVNGWSRINASQSDQWVEDKYLSNATQV 444 

45 

SEQ ID 8576 (GBS301) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 9; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 49 (lane 3; MW 55kDa). 

The GBS301-GST fusion product was purified (Figure 205, lane 4) and used to immunise mice. The 
50 resulting antiserum was used for FACS (Figure 300), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 432 

A DNA sequence (GBSx0468) was identified in S.agalactiae <SEQ ID 1389> which encodes the amino 
acid sequence <SEQ ID 1390>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.53 Transmembrane 8 - 24 ( 3-25) 

Final Results 

bacterial membrane Certainty=0 .3612 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 433 

A DNA sequence (GBSx0469) was identified in S.agalactiae <SEQ ID 1391> which encodes the amino 

acid sequence <SEQ ID 1392>. Analysis of this protein sequence reveals the following: 

20 Possible site: 34 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 
25 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 434 

A DNA sequence (GBSx0470) was identified in S.agalactiae <SEQ ID 1393> which encodes the amino 
acid sequence <SEQ ID 1394>. Analysis of this protein sequence reveals the following: 

35 Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0120 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 435 

A DNA sequence (GBSx0471) was identified in S.agalactiae <SEQ ID 1395> which encodes the amino 
acid sequence <SEQ ID 1396>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>» Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9661> which encodes amino acid sequence <SEQ ID 9662> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 436 

A DNA sequence (GBSx0472) was identified in S.agalactiae <SEQ ID 1397> which encodes the amino 
acid sequence <SEQ ID 1398>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 349 - 365 ( 347 - 366) 



Final Results 



bacterial cytoplasm Certainty=0. 4757 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 1956 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



>GP:AAF43531 GB:AF145054 ORF39 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 212/666 (31%) , Positives = 323/666 (47%) , Gaps = 52/666 (7%) 



Query: 10 WGNNLTLEILSAWNKP NIASNTSTVNVQVFL KMSSYGYISIGETRPLKITVD 61 

W NN + W +1 +NTS V +++ L + Y + E ++ 

Sbjct: 5 WSNNDRGYRIRLVm3QVGQDIQNNTSQWLRLSLIOTTTTFAQYSCSAFVEFNGQRLNWS 64 



Query: 62 GRAETINVNPSINYGQRKLLFAKDYIVNHNSDGNKPLFNISAYYPIN- -FSNYGEATANQ 119 

G +N+I L+VH DG+ +F + A++ + +S NQ 

Sbjct: 65 GSPSVLGWNQTIQ LIDQTITVRHADDGSG-VFGVHAHFNGSGGWSPGNLDIGNQ 117 



Query: 120 SISLPKINRLSVSSAISGVLGNAVTITINRYSTSFTHNLKYDFKGSTGTIATGVGTSYLW 179 

I+L IRS G +GN V I+I+R TH L+Y ++ G IA VGTSY W 

Sbjct: 118 QITLTTIPRGSSWVSDGFIGNQVDISIDRKIGGATHTLRYAWENKQGKIADNVGTSYKW 177 



Query: 180 TIPPTFANLLPNELTGTGNLIVETMDGSAKIGETKYTLSITIPNTATYKPKLSSITLSDT 239 

TIP FAN +PN +G G + V+T I TL+ ++ T KP + TL+DT 

Sbjct: 178 TIPEDFANDIPNSTSGRGTIYVDTYINGNFIQTQSTTLTASV-ITNNLKPSFTGFTLTDT 236 



Query: 240 NTLTSSIVSG-NNFTOIISKVKVDFGSAIGNNGSTITSYNAEIVGKSNSIIGNGSVFDKL 298 

N + IV G +FV I+S VKV F A +G+TI Y AEIVG +NSI NG V ++ 
Sbjct: 237 NPTSQRIVPGQTHFVSIMSLVKVVFNGAQAKSGATIvGYYAEIVGANNSISSNGGVLREV 296 
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Query: 299 DFFGSA--TIRATVTDSRGLTSEPVDTKINVIDYFLPIVTSAKWRSQQNPDILQVLPFV 356 

T+R V DSRG+ S+ V+TK+ + YF P + +V RS + DIL + F 
Sbjct: 297 SVNQDTEMTLRGRVQDSRGIWSDWVETKLTFLFYFSPAL-RFEVKRSDKKLDILTIKRFA 355 

5 

Query: 357 KIAPIIVGGIQKNQLKMSVSVAPYNTGIYAVDSGAATNTWSTISQMSGAPLNLGGTYDKS 416 

KIAP+ V GIQ+N +K++ S A + VD+G A WS+IS+ + + LG +Y 

Sbjct: 356 KIAPLSWGIQRNVMKLTFSTAKVGWDNFWDM3QAGGVWSSISEFNASDAKLGNSYPAD 415 

10 Query: 417 KSWLVKISVSDNLMSATPIIQPVASEFVLVTKAPSGVAFGKIWEHGIIDAKGDVYVDGTI 476 
S++V + D ST V ++ V++T GV GK E G +D GD I 
Sbjct: 416 TSYWIGKLEDEFTS-TSFQATVPTDEVIMTYDRQGVGIGKYRERGALDVNGD 1 468 

Query: 477 YCGDKAIQQKPIAIil^GGSFRHDDTDmSLQDTGFYCVFRGANRPAGAGPGYVTVVRHET 536 
15 Y + IQQ L NNG ++ N+++D GY+FAP + + H + 

Sbjct: 469 YANNSPIQQYQLTNNNGSPKMTNNA- -NTIEDPGQYYLFSAA- - PGNPSGQWGHLFHHSS 524 

Query: 537 ANYAYQQFYDRTNKTI FTRLLENGVWSGWSEYVKKD - - SLQTTGWITIG 583 

A Q F+ + ++R++++ W W E+ + D +L TGW G 

20 Sbjct: 525 YGKGSMYKEAIQIFWSNDGRLFSRHHRWSRI IDD- -WEPWKEFARNDNTNLINTGWQPAG 582 

Query: 584 -NGFKYKRKGDDIDLMYNFASNGLQRWSVGNMPSGLI--PQELMFAITGWTLAPDKSIHL 640 

+G YKR GD + + +NF G + + ++P + PQ MF +TGW++ +K ++ 
Sbjct: 583 VDGSFYKRVGD VLT I KFNFTGTG - GDFLLASVPPEI FKAPQSYMFWTGWS WANKQYNV 641 

25 

Query: 641 QINASG 646 

Q+N G 
Sbjct: 642 QVNEGG 647 

30 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1398 (GBS365) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 82 (lane 2; MW 102kDa). 

GBS365-GST was purified as shown in Figure 216, lane 1 1 . 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 437 

A DNA sequence (GBSx0473) was identified in S.agalactiae <SEQ ID 1399> which encodes the amino 
acid sequence <SEQ ID 1400>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 

40 Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3481 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC34413 GB:AF158600 putative minor structural protein 
50 [Streptococcus thermophilus bacteriophage Sfill] 

Identities = 504/998 (50%) , Positives = 675/998 (67%) , Gaps = 56/998 (5%) 

Query: 1 MLTIHGPDLKPvnFLDNDKQGAUSTYFNHK^ 60 
+LTIH +L+ V ++DN+KQ LN+FN KW R ++G+SV EFSV+KK + DS + Y 
55 Sbjct: 2 LLTIHDNNLQKVAYIDNEKQST^FFNDKOTRSLESGTSVF'EFSVFKKSIKSDSKVEISY 61 

Query: 61 HVLNDQAFVSFTOKGKVQLIMIMKIDEDEKQIDOTCEN^ 120 
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MJ++AFVSF HKGK L N+MKI+EDE+ I CYCENL+LELL EY AYKA+K M+F+ 
Sbjct: 62 KYLNERAEVSFKHKGKSYLFOTMKIEEDEQIIRCYCENLSLELLLEYRGAYKASKPMTFK 121 

Query: 121 EYLVQFDILSWGALWGlOTVKDKKLTLEWTSQETKLRRLLSIAimFDAEIEFETKLNFN 180 
5 EY + + + LT+G NEV D+K TLEW QET IARL+S+A NFDAEIEF+T+L N 

Sbjct: 122 EYFDDWGMGQFAKLTLGVlTOVSDQKRTLEWEGQETTLftRLISLARNFDAEIEFDTRLKPN 181 

Query: 181 HTFKQLIINIYKEYEEGKSYGVDRDKTDVILRYQKNISGIRKTVDKRQIYNAIRPYGKK- 239 
+ ++N+YK Y+ GK+ GV R ++DVIL+Y KNI+GI+++VDK QIYN I PYG+K 
10 Sbjct: 182 SQLDEFVLNVYKAYD-GKNQGVGRRRSDVILKYGKNINGIKRSVDKTQIYNMITPYGRKS 240 

Query: 240 - TVRGERVI SNPVTRKVTKTVGSNRT YLGGDLKYYGHTI KKANVQAI INYAVQYNI L 295 

T + + IS+PVT + VSR Y GGDL Y GHT+ + VQ I N VQ N+L 
Sbjct: 241 DTKKETKRI SDPVT I QNPWVPSARVEKRYAGGDLTYAGHTLSASLVQTI FNLCVQRNLL 300 

15 

Query: 296 PSGIITQLYLESFWGDSTVGKRDNNWAGMSGGAQTRPSGVKVTTGMARPANEGGTYMHYA 355 

PSG+I+QLYLESFWG S V +RDNNW+GM+GGAQTRPSGV VTTG RPA+EGGTYMHYA 
Sbjct: 301 PSGVISQLYIiESFWGSSNVARRDNNWSGMTGGAQTRPSGVWTTGSPRPASEGGTYMHYA 360 

20 Query: 356 SVDDFLKDYTYLLAKQG IYNWGKKNIADYTKGLFRAGGAKYDYAAAGYQSYTNL 410 

SVDDF+KDYTYLLA Q +Y V GK+NI +YTKGLFR GGA YDYAAAGY Y L 

Sbjct: 361 SVDDFMKDYTYLIADQTSGGRKMYGVKGKQNIEEYTKGLFRIGGALYDYAAAGYNHYIYL 420 

. Query: 411 MTNIRNGINKOTGNIIJOTIDKLWQTPVKPITAVNVARRATKTIQA INEATKLKG 464 

25 M +IRNGIN+ GNIL+ +D LW+ P IT N ++ T+T++A +NE LKG 

Sbjct: 421 MRDIRNGINRSNGNILDKLDDLWRQPDNQITQPN- -KQVTRTVKADRVIAVLNEMQGLKG 478 

Query: 465 RRIGSGQCYALSGWYAKKLDGAWIDSSIGGIRGRIGGGMAAALIGTDYNWGAYGWKVDKS 524 
RR+G+GQCYAL+ WY+ KLG +++G GIG GMAAA IGTDY W +GW V + 
30 Sbjct: 479 RRVGNGQCYALAAOTSMKLGGPGLGAGVTGKSGVIGAGMAAAKIGTDYAWDRFGWSWRP 538 

Query: 525 PNAGNLKAGGIYNVRANRGAPFYTTGWGHTGIIKSVSKTRVTVLEQNFVGRMYVVENSYD 584 

+ LK G I N++A T+ WGH II S + + VTVLEQN+ GR YW+NSY 

Sbjct: 539 TSVDQLKPGAIANIKAYNSY-LGTSWGHVSIIISNNGSTVTVLEQNYAGRQYWQNSYP 597 

35 

Query: 585 INSFASGLQTVCYPREIAQGMSVNGATTQQVSGGTQISYEEWQEAQTESYEEEQIIYID 644 

+++ ++T+CYP E+ +G +V G T + ++ E+ + E + ID 

Sbjct: 598 ASAYLGAVETLCYPPELKEGKTVEGRTETVSTPNVEVQKVEIPPIDVEVTTESTAALTID 657 

40 Query: 645 NSIYKEWKDENGKVEYYLKNGFLYAPLSRDRYPSVLTGNETRDNWIRKDMEVETDSQEVL 704 

+ +EW++ENG+VE+YL+NG LYAP+S++ YPS+LTG E DNWIRKDME++TDS++VL 
Sbjct: 658 SKRKQEWRNENGQVEFYLENGSLYAPISKELYPSILTGKENGDNWIRKDMEIDTDSEDVL 717 

Query: 705 MSTGLKDLKAHft.YPAITYEVDGYVDLELGDWRIQDDGYEPPLILTARWEQEISITNPS 764 
45 +ST L++L+ YPAITYEVDG++DL++GD V+IQD G+ P L+L ARV EQ+IS TNP 

Sbjct: 718 ISTALRNLRKFCYPAITYEVDGFLDLDIGDTVKIQDTGFSPMLMLEARVSEQQISFTNPV 777 

Query: 765 SNKTKFSNFVEKESQLASDLISDMLRLYDESIPYEIKLATSNGVAFKNGTGESVLTPSLQ 824 
NKT F+NF +++++ L+S M +L +E+IPYE+KL+T NG FKN TG+SVL +L+ 
50 Sbjct: 778 ENKTVFANFQTLQNKATSDSLLSRMTKIAEEAIPYELKLSTDNGTTFKNSTGQSVLKATLE 837 

Query: 825 KKGI03YEAVYFYKNGDSLIDIGPSLIVKASDFNHV]^ITVEAYLNEELVASTQISFTDTE 884 

KNG+ Y+ ++F+KNGDS+I G L+VK +DF + L +TVEAYL++ELVAS +I+FTD 
Sbjct: 838 KNGEWQPIFFFKNGDSIIGTGNQLWKPTDFENTLQVTVEAYLDDELVASAEITFTDVS 897 

55 

Query: 885 DGADGKDGAPGPQGPPGVNGLQGPKGDQGIQGPAGADGKATYTHIAYALDENGSTGFSVS 944 

DG QGPKGD G+ L S G+ 

Sbjct: 898 DGK OGPKGDDGVS PINLIIESSNGYQFK 925 

60 Query: 945 DNVGKTYI - -GMYVDDNI IDSNDPK- KYKWNLI KGADG 979 

+N+ T +Y D+ ID + + Y W+ + ADG 

Sbjct: 926 NNI INTTFTAKLYQDNKEIDKDGTRYAYLWSKV-NftDG 962 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1401> which encodes the amino acid 
65 sequence <SEQ ID 1402>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
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>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.56 Transmembrane 325 - 341 ( 323 - 343) 



Final Results 

5 bacterial membrane Certainty=0. 2423 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 23/55 (41%) , Positives = 27/55 (48%) 

Query: 886 GADGKDGAPGPQGPPGVNGLQGPKGDQGIQGPAGADGKATYTHIAYALDENGSTG 940 

G GKDGAPG G PG G +G +G+ G QGP G G+ T G G 

Sbjct: 181 GEAGKDGAPGKDGAPGEKGEKGDRGETGAQGPVGPQGEKGETGAQGPAGPQGEAG 235 
15 Identities = 48/151 (31%) , Positives = 58/151 (37%) , Gaps = 19/151 (12%) 

Query: 852 KASDENHVLNITVEAYLNE - -ELVASTQI S FTDTEDGADGKDGAPGPQGPPGVNGLQGPK 909 

KDF Ij ELE+L++I +GGG GPQG G G QGPK 

Sbjct: 82 KEEDFQKELKDFTEKRLKEILDLIGKSGIK GDRGETGPAGPAGPQGKTGERGAQGPK 138 

20 

Query: 910 GD---QGIQGPAGADGKATYTHIAYALDENGSTGFS VSDNVGKTYIGMYVDDNI ID 962 

GD QGIQG AG G+ EG G + GK D 
Sbjct: 139 GDRGEQG1QGKAGEKGERGEKGDKGETGERGEKGEAGIQGPQGEAGK DGAPGK 191 

25 Query: 963 SNDPKKYKWNLIKGADGARGIQGPAGADGKT 993 

P + +G GA+G GP G G+T 

Sbjct: 192 DGAPGEKGEKGDRGETGAQGPVGPQGEKGET 222 
Identities = 25/50 (50%) , Positives = 29/50 (58%) , Gaps = 9/50 (18%) 

30 Query: 884 EDGADGKDGAPGPQGPPGVNGL QGPKGDQGIQGPAGADGKA 924 

+DGA GKDGAPG +G G G QG KG+ G QGPAG G+A 

Sbjct: 185 KDGAPGKDGAPGEKGEKGDRGETGAQGPVGPQGEKGETGAQGPAGPQGEA 234 

SEQ ID 1400 was expressed in four different forms. SDS-PAGE analysis of total cell extract is shown in 
35 Figure 122 (GBS105dN - lane 5 & 7; MW 102kDa), Figure 122 (GBS105dC - lane 8-10; MW 81kDa), 
Figure 179 (GBS105d - lane 8; MW 102kDa) and in Figure 181 (GBS105C - lane 2; MW 56kDa). 
GBS105dN-His was purified as shown in Figure 232 (lanes 9 & 10). GBS105dC-His was purified as shown 
in Figure 233 (lanes 3 & 4). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 438 

A DNA sequence (GBSx0474) was identified in S.agalactiae <SEQ ID 1403> which encodes the amino 
acid sequence <SEQ ID 1404>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 

45 Possible site: 32 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2502 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC34412 GB:AF158600 putative minor structural protein 
55 [Streptococcus thermophilus bacteriophage Sfill] 

Identities = 163/433 (37%) , Positives = 244/433 (55%) , Gaps = 21/433 (4%) 
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Query: 


80 


LSSKKPKMLMFSHIPGRYYLAVQVGDLNFKEIKMNGFGEIT--FIVADAYAHSTSYRRIK 


137 






L +KK L P RYYLA+ G+++ K I + + E T F+V D AHST+Y+R+ 




Sb j ct : 


93 


LHTKKAVKLFLPTEPERYYLALVKGEVSLKGIS-DWYDEATIEFLVPDGVAHSTTYKRVT 


151 


Query: 


138 


DYTQDGNKMTFKIKKNGTAPAFPIFRIKHLGENGYIGITNETGAFAVGSPEEEDGTIVHR 


197 






DY + KM FIN G+ A+PI +K ENGY G+ ++ AF G+ EE DG 1+ + 




Sbjct: 


152 


DYQEKDGKMIFSIDNEGSTDAYPIITLKANAENGYYGLVSDKFAFFAGNIEEADGKIISK 


211 


Query: 


198 


NETLFDY-SKAIAQAL-EGAPOTAKLNYMPPTFDSELKRMRLDNILGSGKGGEYVAIGAR 


255 






E L+D+ I QA +GA NVN + + + N+ G IG + 




Sbjct: 


212 


AEVLYDFRDDRI PQAFAKGAKNVGITNVTGDLHGT LEIQNVWGRPH IGLK 


261 


Query: 


256 


GTTPGYGE -HVGTRTFI INPDSNGEY- TLNEHLWWKQI FIATAQDQKGFLKLCVTGENDE 


313 






+ + T I PDS+G LNE++WW+QIF A + Q GFLKL V+ + 




Sbjct: 


262 


NPNANINQLQTASLTLDIPPDSSGNVGALNEYIWWRQIFWAGSISQYGFLKLTVSDADGN 


321 


Query: 


314 


FLYGIETYKRKNGFETEYNFFALDDDGVGWRFYKQFEFQA-DRNYHNPFSMNRSRAVEIF 


372 






FLYG+ET+KR G E+EYN A D G G+RF KQ+ F A + HNPF+ R + +1 




Sbjct: 


322 


FLYGVETFKRSLGLESEYMALASDGYG-GFRFLKQWSFLATEYEDHNPFNEPRGWS-DIK 


379 


Query: 


373 


REEDKFRIYFNGAHHHVTVPSLKGKKSRKIHIAMGTCSDSSKYIlSryOTjFEKVNFEKMGVS 


432 






RE+DK Y+ G ++ T+P +KGKKS KIHL + S ++ + F+++ + K + 




Sb j ct : 


380 


REDDKVTFYWWGTYNTFTIPEIKGKKSAKIHLTISNI-PSKSFVTHAYFDQLLYIKTNNA 


438 


Query: 


433 


HYNNIVNKYQPGDEVI INFENDTVSTKDIDSIQDWLGSKMISIPPGESELWHLSSWVA 


492 






+ +1 N+Y G +IIN E+DT++ ++ ++ ++V GS IPPGES++ V S W 




Sbjct: 


439 


FFEDIPNRYIQGSNLIINSEDDTLTLJINI.LNLDEIVDGSLWPVIPPGESQIEWQSPWAK 


498 


Query: 


493 


ALPDISIDFEERY 505 








P ++I+FEER+ 




Sbjct: 


499 


KKPSVTIEFEERW 511 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 439 

A DNA sequence (GBSx0475) was identified in S.agalactiae <SEQ ID 1405> which encodes the amino 
acid sequence <SEQ ID 1406>. This protein is predicted to be PblA. Analysis of this protein sequence 
reveals the following: 

Possible site: 57 

>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-7, 


.11 


Transmembrane 


427 


- 443 


( 


424 


- 445) 


INTEGRAL 


Likelihood = 


-4 


.99 


Transmembrane 


449 


- 465 


( 


448 


- 469) 


INTEGRAL 


Likelihood = 


-2. 


.71 


Transmembrane 


41 


- 57 


( 


38 


- 57) 


INTEGRAL 


Likelihood = 


-0 


,37 


Transmembrane 


361 


- 377 


( 


361 


- 377) 


INTEGRAL 


Likelihood = 


-0.22 


Transmembrane 


324 


- 340 


( 


324 


- 340) 



Final Results 

bacterial membrane Certainty=0. 3845 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG18638 GB:AY007505 PblA [Streptococcus mitis] 
Identities = 233/401 (58%) , Positives = 296/401 (73%) , Gaps = 17/401 (4%) 

Query: 1 MATNLGQAYVQIMPSAKGISGSISKTLDPEASSAGSSAGSLLGGKLIGILGSVIAAAKIG 60 
MAT + QAYVQ++PSA+GI+G I L+PEAS+AG SAG LG L+G++ VIAAA IG 
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Sbjct: 1 MATEIAQAYVQLIPSARGITGKIQSILNPEASAAGQSAGQSLGSSLVGVMTKVIAAAGIG 60 

Query: 61 EiVIVTKAISSSISEGAALQQSLGGVETLFKSNANLVKKYADEAYKTTGLSANAYMESVTGF 120 

KA.S++ISEGAALQQSLGG+ETLFK +A+ VK YA+EAYKTTGLSANAYME+VTGF 
Sbjct: 61 KAFSAAISEGAALQQSLGGIETLFKGSADKVKGYANEAYICrTGLSANAYMENVTGF 116 

Query: 121 SASLLQSLGGDTAKAAKVANMAMIDMADNSNKMGTS^SIQYAYQGFAKQOTTMLDNLKL 180 

SASLLQSLGGDT KAA+ ANMAMIDM+DN+NKMGTSMESIQ AYQGFAKQNYTMLDNLKL 
Sbjct: 117 SASLLQSLGGDTNKAAETANMAMIDMSDNANKMGTSMESIQMAYQGFAKQNYTMLDNLKL 176 

Query: 181 GYGGTQEEMKRLLSDAQKLTGKKYDISNLSDVYEAIHAIQGKIGITGTTAKEAATTFTGS 240 

GYGGT++EM+RLL+DA+KLTG KYDI+NLSDVY AIHAIQ + ITGTTAKEAA+TF+GS 
Sbjct: 177 GYGGTKQEMQRLLADAEKLTGVKYDINNLSDVYSAIHAIQENLDITGTTAKEAASTFSGS 236 

Query: 241 FEAMKAASKMltLGKMALGEDIKPSLKALPDTTSNFVLMMFIPMLTNVFKGFGSVISLTFS 300 

FE+MKAA++N+LGK+ALGE+I PSL AL TTS F+ +NF+PM+ NVF G G V++ S 
Sbjct: 237 FESMKAAAQNVLGKIJULGENILPSLHALLKTTSTFLFDNFLPMIGNVFSGLGLVLTEGIS 296 

Query: 301 ELI PKIVGFMQTSGPSLMQSGI SFI I SFVNGFLTAYPAFLTVAGKI FTDFVS FVMQS IPG 360 

++ ++ G S + +S + G + F + G + ++ +1 G 

Sbjct: 297 QIASQLFG DAFGSAVFDQLSRITGIFETF--FDMIFGSLSKQDNIDILNTI-G 346 

Query: 361 LLQAGATLVLNLIDGILANLPQIATS AVSVISSFISML 398 

+ AT ++N+ D I I ++ V ++ F+ L 

Sbjct: 347 FSEEAATQIVNIADNIRVTFENIGSAIGDVVGIVGDFVGDL 387 
Identities = 112/386 (29%) , Positives = 172/386 (44%) , Gaps = 18/386 (4%) 

Query: 235 TTFTGSFEAMKAASKNLLGKMA-LGEDIKPSLKA LFDTTSNFVIjNNFIPMLTNVFKG 290 

TT+ E++KA ++ + L E IK + L T V+ FI N++ 

Sbjct: 580 TTWNAYVESLKAMWNAVVTFFSDLWESIKEAASTAWTLITTAV^ 639 

, Query: 291 FGSVISLTFSELIPKIVGFMQTSGPSLMQSGISFIISFVNGFLTAYPAFLTVAGKIFTDF 350 
++ + + G + S+ I II V G A L++ + + 

Sbjct: 640 ISEGLTQVWEGIKLIFEGAWEFI-KSIFLGAILIIIDLVTGNFGQLGADLSLIWEGIKNG 698 

Query: 351 VSFVMQSIPGLLQAGATLVIiNLIDGIL^LPQIATSAVSVISSFISMLQANYPAILKKGF 410 

+S + + I +++ G+ N + ++ I + SM +1 
Sbjct: 699 ISLIWEGIKTYFSGWDVIVGYATGVFENFSNVLSTIWEFIKTAASMA WEWIKSTVS 755 

Query: 411 EILSYLVQGIIARLPDIVITVGKL IAILAGAIASNLPKVLALGVQLLITFVKGILSV 467 

+++ L+QG +V + L I AASLKLLG + VG + 

Sbjct: 756 NLITGLIQGAQNLWNNFVSFLSGLWENIKSTASAAWSGL-KSLVLG- -FINGLVSGAQTA 812 

Query: 468 IGKINETANNIGEK LINAIKSIDLLSAGRAIMRGFLRGLEDVWGDIQNFVGDIAGWI 524 

+ + +++ K + N IK+I+L AG+AI+ GFL GL+ W + NFVG IA WI 
Sbjct: 813 WNNMKQAVSDLVTKVTNIFNGIKNINLWEAGKAILNGFLGGLKSAV^ 872 

Query: 525 KDHKGPISYDRRLLIPAGNAIMQGLHQGLVDKFKPVKNLVNGMAEEIQSSFGNPQLAFDM 584 

+DHKGPI YDR+LLI PAGNAIM L GL D FK VK V GM+ EI F h + 
Sbjct: 873 RDHKGPIEYDRKI.LIPAGNAIMGSLDNGLKDGFKDVKKTVGGMSGEISDVFSGDNLDLNS 932 

Query: 585 DTNVNNGFE-RIGTLNKNLSSQVTST 609 

+V E R+ + L Q + T 
Sbjct: 933 TASVTKNLEARIAMPSAQLEVQESKT 958 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1407> which encodes the amino acid 
sequence <SEQ ID 1408>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
>» Seems to have no N-terminal signal sequence 
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Likelihood = 


-2. 


.76 
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( 


458 


- 474) 
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Likelihood = 
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Transmembrane 


483 
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( 


482 
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Likelihood = 


-2. 


.02 
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- 445 


( 


429 


- 445) 


INTEGRAL 


Likelihood = 


-1 


.28 


Transmembrane 


397 


- 413 


( 


397 


- 413) 


INTEGRAL 


Likelihood = 


-0. 


.53 


Transmembrane 


739 


- 755 
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INTEGRAL 


Likelihood = 


-0. 
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Transmembrane 


356 
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356 
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Final Results 

bacterial membrane Certainty=0 .2105 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

>GP:AAB18717 GB:U38906 0RF42 [Bacteriophage rlt] 
Identities = 261/579 (45%), Positives = 359/579 (61%), Gaps = 63/579 (10%) 

10 Query: 184 MKRLLSDAEKLPAAMGKKFDLSNYADVVEAIHLVQDNMGIAGVAAEEAKTTFSGSLAAMK 243 
M+RLL+DA+KL G+K+D+SN++D+ +AIH +Q M 1 G A+FA TTFSGS +MK 
Sbjct: 1 MQRLLTDAQKLT GQKYDISNFSDITQAIHAIQTEMDITGTTAKEASTTFSGSFDSMK 57 

Query: 244 SSFTNWIAGLSLGDDIRPALRGLAETTSNFLFGNFIPMVANIFKGLPSAIGTFIGAAR.pl 303 
15 ++ +NV+ LSLG D++ L L TTS FLF NFIPMV NIFK LP AI TF+ AA 

Sbjct: 58 AAMSNVLGNLSLGRDLQGPLNALVSTTSTFLFKNFIPMVGNIFKALPGAISTFVSAAGKE 117 

Query: 304 ITSQ FQGLMSSLG- IS1DLSPIT 325 

++SQ F L+SS+G IS + + 

20 Sbjct: 118 LSSQLGNGIGSGFSDFTAKFSSILSPLQGSFQTIVSGLKPVFDSLLSSIGPISTQIMGVF 177 



25 



Query: 326 AKFAQIGQNLQ PVFNGLKTAFSQLPSFFTSIGSAVAPVIDTIISGLARLDFSGFEA 381 

+K Q+ N+ PV + L AF QLPS F +1 AV P+IDTI SG++RLDFSG +A 

Sbjct: 178 SKLPQLFSNVISAVIPVISTLSVAFGQLPSLFEAISVAVQPMIDTISSGISRLDFSGIQA 237 

Query: 382 LISAILPALQAGFSNFAAIVGPAISGWDSFVGMWNAAQPLISILSDALMPVFQILGSFL 441 

+ISA++PA+ G + I+GP+I +V+SFV MWN+ QPL ++++ ALMP FQ+LG+F+ 
Sbjct: 238 I ISAL VPAITTGITTMMGI IGPS IDTLVNSFVKMWNSIQPLATVIAGALMPAFQVLGAFI 297 

30 Query: 442 GGWKGAIMGVSFAFDAVKvAIQLVTPIIDLLV^ 501 

GGV+KGA++ +S FD ++V + +TPII ++ PVL+ +A+W+G AIG F N 

Sbjct: 298 GGVTjKGAMIiALSATFDTIRVWGFLTPIIAAvT^FQEFAPVLATVAQWVGTAIGFFANF 357 

Query: 502 GTAGQGLSAFIKSAVmilQTAISTAGTIISWIDYIKLAFSGAGSAVGvLKNIFSLAWMA 561 
35 G AG L I SAW I++ IS+ + I +1+ K F+G GSA G L+++ S AW 

Sbjct: 358 GAAGTSLKGLITSAWNGIKSIISSWSGIGGIINTAKAIFTGLGSAGGALRSMISGAWSG 417 

Query: 562 MGDAINVAKGI ISSVINGIKSAFSSFS SLVSSVGSAVNGVIDSISSTIRG- - - 611 

+ 1+ G IS INGIKS FSS S++S V S + G+I SSTI G 

40 Sbjct: 418 IRSIISSVGGSISGTINGIKSFFSSLGGSGNGLRSVMSGVWSGITGIISGASSTISGIID 477 

Query: 612 LANIDISGAGAAIMNGFLNGLKSAWGAWSFVSGIANWIAEHKGPISYDRVL 663 

L NID++GAG A+ ++GF+ GLKS W A K FV GIA+WI +HKGPISYDR + 
Sbjct: 478 GIKNIFNSLKNIDLAGAGRAVIDGFVGGLKSTWEAGKKFVGGIADWIKDHKGPISYDRKI 537 

45 

Query: 664 LKPAGKAIMGGLNTSLIDGFKEVKSNVSGMADDLASTMT 702 

L PAG+AIMGG N SL++ FK V+ NVSG+A + S +T 
Sbjct: 538 LIPAGQAIMGGFNDSLMENFKAVQKNVSGIAKQIQSAIT 576 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 272/701 (38%) , Positives = 371/701 (52%) , Gaps = 91/701 (12%) 

Query: 1 MATNLGQAYVQIMPSAKGISGSISKTLDPEASSAGSSAGSLLGGKLIGILGSVIAAAKIG 60 
MAT LGQAYVQIMPSA+GISG+ISK LDPEA SAG SAGSL+GG L+ ++G I AAA IG 
55 Sbjct: 1 MATELGQAYVQIMPSARGISGAISKQLDPEARSAGLSAGSLIGGNLVKMIGGAIAAAGIG 60 

Query: 61 EMVTKAI SSS I SEGAALQQSLGGVETLFKSNANLVKKYADEAYKTTGLSANAYMES VTGF 120 

+M ISS++S GA LQQS GG++TL+K VK +A EAYK G+SAN Y E 

Sbjct: 61 KM ISSALSAGADLQQSFGGIDTLYKGAETAVKGFAKEAYKA-GISANTYAEQAVSM 115 



60 



Query: 121 SASLLQSLGGDTAKAAKVANMAMIDMADNSNKMGTSMESIQYAYQGFAKQNYTMLDNLKL 180 

ASL QSLGGD AAK ANMA+ +DMADNS KMGT + S1Q AYQGFAKQNYTMLDNL+L 
Sbjct: 116 GASLKQSLGGDAVAAAKAANMAIMDMADNSAKMGTDITSIQMAYQGFAKQNYTMLDNLRL 175 



65 Query: 181 GYGGTQEEMKRLLSDAQKL TGKKYDISNLSDVYEAIHAIQGKIGITGTTAKEAATTF 237 

GYGGT+EEMKRLLSDA+KL GKK+D+SN +DV EAIH +Q +GI G A+EA TTF 
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Sbjct: 176 GYGGTKEEMKRLLSDAEKLPAAMGKKFDLSNYMV\^IHLVQDNMGIAGVAAEEAKTTF 235 

Query: 238 TGSFEAMKAASKNLLGKMALGEDI KPSLKALFDTTSNFVIiNNF I PMLTNVFKGFGSVI SL 297 

+GS AMK++ N++ ++LG+DI+P+L+ L +TTSNF+ NFIPM+ N+FKG S I 
Sbjct: 236 SGSIlAAMKSSFTlTO^GLSLGDDIRPALRGLAETTSNFLFGNFIP^W■ANIFKGLPSAIGT 295 

Query: 298 TFSELIPKIV GFMQTSGPSLMQSGISFIISFV NGFLTAY PAFLTV 342 

PI GM+GS+SI+ ++ NG TA+ P+F T 

Sbjct: 296 FIGAAAPIITSQFQGLMSSLGISIDLSPITAKFAQIGQNLQPVFNGLKTAFSQLPSFFTS 355 

Query: 343 AGKI FTDFVSFVMQS I PGL LQAGATLVTjNLIDGILANLPQIATSAVS-VISSFISM 397 

G + ++ + L +A + +L + +N I A+S V+ SF+ M 

Sbjct: 356 IGSAVAPVIDTIISGLARLDFSGFEALISAILPALQAGFSNFAAIVGPAISGWDSFVGM 415 

Query: 398 LQANYPAI LKKGFE I LSYL VQGI IARLPDIVIT 430 

API L F+IL + G+ + + D+++ 

Sbjct: 416 WNAAQPL IS I LSDALMPVFQ ILGS FLGG WKGALMG VS FAFDAVKVAIQLVTP I IDLLVQ 475 

Query: 431 VGKLIAIIAGAIASNLPKVLALGV--QLLITFVKGILSVIGKINETANNIGEKLIN 484 

V +++++A I + LG Q L F+K +1 TA I +1+ 

Sbjct: 476 GIMFVQPVLSVIAEWIGVAIGMFGNLGTAGQGLSAFIKSAWTNIQTAISTAGTIISTVID 535 

Query: 485 AIKSI DLLSAGRAIMRGFLRGLEDVWGDIQNFVGDIA 521 

IK D ++ +1+ + G++ + + V + 

Sbjct: 536 YIKIAFSGAGSAVGVLKNIFSLAWMAMGDAINVAKGIISSVINGIKSAFSSFSSLVSSVG 595 

Query: 522 GWIKDHKGPISYDRRLLI PAGNAIMQGLHQGLVDKFKPVKNLVNGMAEEIQSSFG 576 

+ IS R L AG AIM G GIi + VK+ V+G+A I G 

Sbjct: 596 SAWGVIDSISSTIRGLANIDISGAGARIMNGFINGLKSAWGAVKSFVSGIANWIAEHKG 655 

Query: 577 NPQLAFDMDTNVNNGFERIGTLNKNLSSQOTSTnNYTSGNA 617 

+++D G +G LN +L + SG A 

Sbjct: 656 --PISYDRVLLKPAGKAIMGGLNTSLIDGFKEVKSNVSGMA 694 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 440 

A DNA sequence (GBSx0477) was identified in S.agalactiae <SEQ ID 1409> which encodes the amino 
acid sequence <SEQ ID 141 0>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2565 (Affirmative) <: suco 

bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG18637 GB:AY007505 unknown [Streptococcus mitis] 
Identities, = 64/119 (53%) , Positives = 87/119 (72%) , Gaps = 2/119 (1%) 

Query: 1 MLK^EDALVCDLAETYHIYDYKQLPPIiKVAVFSLGLREESRINRVISGNRVSFERRILA 60 

M++ DEDAL+CDLAETY I+DY+QLP +VAVF+ GLR++SRI ++ ++V FE +IA 
Sbjct: 1 MIQTDEDALICDLAETYGIFDYRQLPADQVAWAFGLRDDSRIKLAMTNSKVPFETFLLA 60 

Query: 61 GMFDRLGMLIWMKTTDGQKGKNRPEMVSTMF--DNQQKDSEWSFGSGKDFEETRNNIL 117 
G+ DRL L+W KTTDGQKG N+P MV+ + K+S+ + F SG+DFEE R IL 

' Sbjct: 61 GVLDRLSALWFKTTDGQKGINKPLMVTEELTGKTKAKESKEMIFDSGEDFEEYRQKIL 119 

A related DNA sequence was identified in S.pyogenes <SEQ ID 141 1> which encodes the amino acid 
sequence <SEQ ID 1412>. Analysis of this protein sequence reveals the following: 
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Possible site: 41 

»> Seems to have no N-terminal signal sequence 



40 



Final Results 

5 bacterial cytoplasm Certainty=0. 2905 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 60/123 (48%) , Positives = 82/123 (65%) , Gaps = 2/123 (1%) 

Query: 1 MLKMDEDALVCDLAETYHIYDYKQLPPLKVRVFSLGLREESRINRVISGNRVSFERRILA 60 

M+ D+DAL CDLAETY IYDY+QLP +VAVF++GLR SRI +SG + + +LA 
Sbjct: 1 MIAKDDDALTCDl^TYGIYDYRQLPAYQVAVFAVGIiRSNSRIKMALSGETEALDTVLIift. 60 

15 

Query: 61 GMFDRLGMLIWMKTTDGQKGKNRPEMV- - STMFDNQQKDSEWSFGSGKDFEETRNNILG 118 

G++D +L W KT DGQ G+N+P+ V + QK ++V+SF SG+DFE R +LG 

Sbjct: 61 GIYDNTNbLFWSKTKDGQSGQNKPKSWEAISGSKSQKANDVISFVSGEDFENARKQLLG 120 

20 Query: 119 FGG 121 

G 

Sbjct: 121 GDG 123 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 441 

A DNA sequence (GBSx0478) was identified in S.agalactiae <SEQ ID 1413> which encodes the amino 
acid sequence <SEQ ID 1414>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2280 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:AAG18636 GB:AY007505 unknown [Streptococcus mitis] 
Identities = 40/80 (50%) , Positives = 62/80 (77%) , Gaps = 1/80 (1%) 

Query: 3 TSSGFEYKIEESRLKNYELVEALADLESNPLSLPKVLRLLLGDQVESLKNHLRASDGTVS 62 

TS+GF ++I + RL+NYEL+EA++++++NP LPKV++L+LG++ E LKNH+R +DG V 
Sbjct: 24 TSTGFPFEITKERLENYELLEAISEVDTNPAvLPKVVKLMLGNKSEDLKNHvRTADGIVP 83 

45 Query: 63 TEALMEEVKEIFES-GQLKK 81 

+ + E+ EIF S QLKK 
Sbjct: 84 LDKMGAEISEIFSSQNQLKK 103 

A related DNA sequence was identified in S.pyogenes <SEQ ID 141 5> which encodes the amino acid 
50 sequence <SEQ ID 141 6>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 4365 (Affirmative) < suco 

bacterial membrane Certaihty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 42/75 (56%) , Positives = 60/75 (80%) 

Query: 2 KTSSGFEYKIEESRLKNYELVEAIJUDLESNPLSLPKVLRLLLGDQVESLKNHLRASDGTV 61 

KT+SGFEY+I + RLKN+ELVEA+A+ E++P ++ K++ LLLGD +SLK H+R ++G V 
Sbjct: 7 KTTSGFEYEIPKKRLKNFELVEAIAEEETDPTAvVKIVNLLLGDAAKSLKEHVRDAEGIV 66 

Query: 62 STEALMEEVKEIFES 76 

EA+ E+KEIFES 
Sbjct: 67 DVEAIGVEIKEIFES 81 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 442 

A DNA sequence (GBSx0479) was identified in S.agalactiae <SEQ ID 1417> which encodes the amino 
acid sequence <SEQ ID 141 8>. This protein is predicted to be Structural protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3461 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG18635 GB:AY007505 unknown [Streptococcus mitis] 
Identities = 114/183 (62%) , Positives = 142/183 (77%) 



Query: 


2 


VANSSNVTTAKPKIGGAIYTAPLGTELPKDTASELNEAFKSLGYISEDGLSNEDKRESEE 


61 






+A +NVTTAKPKIGGA+Y+APLGT LP D ++L++AF++LGYIS+DG++N + ESE 




Sb j ct : 


1 


MATEANVTTAKPKIGGAVYSAPLGTALPTDATTKLDQAFEALGYISDDGMTNSNSPESEN 


60 


Query: 


62 


IQAWGGDVVESAQKSKADKFTYTLIEALNIEvIjKEIYGKDNvTGDLKTGITVKSNSKPLE 


121 






I+AWGG W S QK K D F Y LIEALN+ VLKE+YG DNV+GDL +GIT+K+NSK L 




Sb j ct : 


61 


IKAWGGVWSSVQKEKTDTFKYMLIEALNLHVLKEVYGPDNVSGDLSSGITIKANSKELP 


120 


Query: 


122 


EHCLVIEMILKNNTVKRIVIPKGKVSEVGEIKYVDNEAflGYETTLQAFPDAEGNTHYEYI 


181 






HCLVIE +LK +KRIVIP GKV+ + EI Y D GY TT+ AFP+A +THYEYI 




Sbjct: 


121 


HHCLVIETVLKGGVLKEIVIPSGKVTAIDEITYNDGSVLGYGTTVTAFPNARDDTHYEYI 


180 


Query: 


182 


KGA 184 








KGA 




Sbjct: 


181 


KGA 183 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1419> which encodes the amino acid 
sequence <SEQ ID 1420>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2379 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 119/182 (65%) , Positives = 142/182 (77%) 
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Query: 4 NSSNVTTAKPKIGGAIYTAPLGTELPKDTASEIiNEAFKSLGYISEDGIjSNEDKRESEEIQ 63 

++ NVT+AKPK GGAIY+APLGTELPKD SELN FK+LGY+SEDG+ NED R SE 1+ 
Sbjct: 6 DTKNVTSAKPKTGGAIYSAPLGTELPKDAKSELNTKFKNLGYVSEDGVVNEDTRSSENIK 65 

Query: 64 AWGGDWESAQKSKADKFTYTLIEALNIEVLKEIYGKDNVTGDLKTGITVKSNSKPLEEH 123 

AWGGD+V + Q K DKFTY LIE+LN+EVLKE+YG NVTGDL GI +KSNSK LE H 
Sbjct: 66 AWGGDIVGAVQTEKEDKFTYKLIESLNWVLKEVYGAVNVTGDLSGGIQIKSNSKELEAH 125 

Query: 124 CLVIEMILKNNTVKRIVIPKGKVSEVGEIKYVDNEAAGYETTLQAFPDAEGNTHYEYIKG 183 

+V++MI+ +KRIV+P KV EVGEIKYVD E GYETTI1+ FPD +G+TH EYI 
Sbjct: 126 VIVVDMIIWGGILKRIVLPNAKVDEVGEIKYVDGEWGYETTLKCFPDKDGDTHREYIVK 185 

Query: 184 AG 185 
G 

Sbjct: 186 PG 187 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 443 

A DNA sequence (GBSx0480) was identified in S.agalactiae <SEQ ID 1421> which encodes the amino 
acid sequence <SEQ ID 1422>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2214 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18710 GB:U38906 ORF35 [Bacteriophage rlt] 
Identities = 52/78 (66%) , Positives = 66/78 (83%) 

Query: 1 MSKFKFKLNKAGVAELMKSSEMQQVLTTKATAIRERCGDGYAQDIHVGKNRANAMVSAKT 60 

M+K FKLN++GVA +MKS EMQ +L KA+A+++RCG GY QD+HVGKNRANAMV A+T 
Sbjct: 1 MRKNLFKLNRSGVAS^KSPEMQAILKEKASAVKQRCGPGYGQDMHVGKNRANAMVFAET 60 

Query: 61 IKAKKDNSKNNTLLKAVR 78 

+AK+DN KNNT+LKAVR 
Sbjct: 61 YQAKRDNMKNNTILKAVR 78 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1423> which encodes the amino acid 
sequence <SEQ ID 1424>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2446 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 75/78 (96%) , Positives = 76/78 (97%) 

Query: 1 MSKFKFKLNKAGVAELMKSSEMQQVLITKATAIRERCG^ 60 

MSKFKFKLN+AGVAELMKSSEMQQVLTTKATAIRERCGDGY QDIHVGKNRANAMVS KT 
Sbjct: 1 MSKFKFKtjNRAGVAELMKSSEMQQVLTTKATAIRERCGDGYVQDIHVGKMRANAWSTKT 60 
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Query: 61 IKAK3CDNSKNNTLLKAVR 78 

IKAKKDNSKNNTLLKAVR 
Sbjct: 61 IKAKKDNSKNNTLLKAVR 78 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 444 

A DNA sequence (GBSx0481) was identified in S.agalactiae <SEQ ID 1425> which encodes the amino 
acid sequence <SEQ ID 1426>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2888 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18709 GB:U38906 0RF34 [Bacteriophage rlt] 
Identities = 41/59 (69%) , Positives = 45/59 (75%) 

Query: 1 MTGKKVEYILAIPKGDKHDWEDKOTCFFDKKWRTVGI^ 59 

+TGKK Y LAIPK D HDWE+K+V FF K WRT G LEGIE LIPL+WNKKV VE Y 
Sbjct: 56 LTGKKAIYTLMPKKDTHDWFJSTKKVR 114 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1427> which encodes the amino acid 
sequence <SEQ ID 1428>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 51/60 (85%) , Positives = 57/60 (95%) 

Query: 1 MTGKKVEYILAIPKGDKHDWEDKEVCFFDKKTOWGI^EGIEELIPLEWNKKVMVERYE 60 

+TGKKVEY+LAIPKGD+HDWE+KEV FF KKWRTVG+ LEGIEELIPL+WNKKVMVERYE 
Sbjct: 50 LTGKKVEYVIAIPKGDEHDWENKETOFFGKKMJTVGIPLEGIEELIPLDWNKKVMVERYE 109 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 445 

A DNA sequence (GBSx0482) was identified in S.agalactiae <SEQ ID 1429> which encodes the amino 
acid sequence <SEQ ID 1430>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2770 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18708 GB:U38906 ORF33 [Bacteriophage rlt] 
Identities = 89/130 (68%) , Positives = 106/130 (81%) , Gaps = 1/130 (0%) 

5 

Query: 1 MTNFATTDDVILLWRQLSVDE I KRAEALLETVSDTLRLEASKVGKNLDEMI LETP - YFAT 59 

M FAT DD+ +LWR L DE +RAE LLE VSD+LR EA KVG++L MI E P YFA+ 
Sbjct: 1 mPFATVDDLTMLWRPLKGDEKERAEKLLEIVSDSLREEADKVGRDLYAMIAEKPSYFAS 60 

10 Query: 60 vLKSvTVI)IVARTLMTATQGEPMSQESQSAIX3YTWSGTYLVPGGGLFIKDSELKRLGLKK 119 

V+KSVTVDIVARTLMT+T EPM+Q ++SALGY+ SG+YL VPGGGLFI K+ SEL RLGLKK 
Sbjct: 61 WKSVTVDIVARTLMTSTDQEPMTQTTESALGYSVSGSYLVPGGGLFIKNSELSRLGLKK 120 

Query: 120 QRYGGIELYG 129 
15 QR+G 1+ YG 

Sbjct: 121 QRFGVIDFYG 130 

A related DNA sequence was identified in S. pyogenes <SEQ ID 143 1> which encodes the amino acid 

sequence <SEQ ID 1432>. Analysis of this protein sequence reveals the following: 

20 Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2061 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



30 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 116/138 (84%), Positives = 129/138 (93%) 

Query: 3 NFATTDDVILLTTOQLSVDEIKRAEALLETVSDTLRLEASKVGKNLDEMILETPYFATVLK 62 

NFATTDDVI LLWR LSVDE+KRA ALL+ VSDTLR+EA KVGK+LD+ +++ PYF V+K 
Sbjct: 3 NFATTDDVILLWRPLSVDELKRANALLKWSDTLRMEADKVGKDLDKTMVDKPYFVNVIK 62 



35 Query: 63 SVTVDIVARTLMTATQGEPMSQESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 122 

SVTVDIVARTLMT+T+GEPM+QESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 
Sbjct: 63 SVTVDIVARTLMTSTRGEPMAQESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 122 

Query: 123 GGIELYGEIERNNSYFSR 140 
40 GGIELYGEIER+NS FSR 

Sbjct: 123 GGIELYGEIERDNSCFSR 140 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 446 

A DNA sequence (GBSx0483) was identified in S.agalactiae <SEQ ID 1433> which encodes the amino 
acid sequence <SEQ ID 1434>. This protein is predicted to be Structural protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 30 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3015 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAB18706 GB:U38906 Structural protein [Bacteriophage rlt] 
Identities = 132/296 (44%) , Positives = 189/296 (63%) , Gaps = 8/296 (2%) 

Query: 5 IKAGTLFKPELvTEIMSKVKGHSTLAKLSGQTPIPENGVEQPVFNLDGNAQIVGEGEQKL 64 
5 + GTLF P LVT+++SKV G S++A+LS Q PIPFNG + F F +D +V E +K 

Sbjct: 3 IiNKGTLFDPTLOTDLISKVAGKSSIARLSAQKPIPFNGEKVFTFTMDSEIDVVAESGKKT 62 

Query: 65 GOTAKOTSKIIKPLKFWQARMTDEFKYASEEKRmB^KHYADGFAKKMAEAFDIAAlHG 124 
+ + + P+K Y AR++DEF YAS+E+++N L+ + DGFAKK+A D+ A HG 
10 Sbjct: 63 HGGVTIAPQTMVPIKVEYGARISDEFMYASDEEKINILQEFNDGFAKKVARGIDLMAFHG 122 

Query: 125 LEPRTMTDASFKATNSFDGWTGNVIKYEADK--IDDN--IDAAVTTIVANGNDvTGIAL 180 

+ PR T ++ TN FD VT K EA + D N 1+ AV + DVTGIA+ 
Sbjct: 123 VNPRLGTASAVIGTNHFDSKVTQ KVEAPRGIADPNGAIENAVELLTGVDADVTGIAI 179 

15 

Query: 181 SPQAGQDMSKRKDKFDNVMYPEFRFGQRPSNFFNMTLDINKTLTMKGGTAKDDHAIVGDF 240 

+P ++K+KD DN ++PE ++G P + +D+NKT++ T + D AI+GDF 

Sbjct: 180 NPSFRSALAKQKDLQDNALFPELKWGATPDTINGLPVDVNKTVSDMSLTQR-DRAIIGDF 238 

20 Query: 241 QNMFKWGYAENIPMEIIEYGDPDGSGRDLKAYNEILLRTEAFIGWGILDEKAFSRV 296 

N FKWGYA+ +P+E+I+YGDPD SG DLK YN++ +R E F+GWGILD F+RV 
Sbjct: 239 ANGFKWGYAKEVPLEVIQYGDPDNSGLDLKGYNQVYIRAELFLGWGILDATKFARV 294 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1435> which encodes the amino acid 
25 sequence <SEQ ID 1436>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm — Certainty=0. 2772 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

35 Identities = 133/298 (44%) , Positives = 187/298 (62%) , Gaps = 2/298 (0%) 

Query: 1 MAESIKAGTLFKPELVTEIMSKWGHSTLAKLSGQTPIPFNGVEQFVFNLDGNAQIVGEG 60 

M +LF LV+++++KVKGHS+IAKLS Q PIPFNG ++F F LD + +V E 

Sbjct: 1 MGTETSKASLFDKHLVSDLINKVKGHSSLAKLSSQKPIPFNGSKEFTFTLDSDIDWAEN 60 

Query: 61 EQKLGNTAKVTSKIIKPLKFVYQARMTDEFKYASEEKRI^FLKHYADGFAKKMAEAFDIA 120 

+K +1 P+K Y AR++DEF YA+EE++++ LK + +GFAKK+A D+ 

Sbjct: 61 GKKTHGGLSLEPVTIVPIKVEYGARLSDEFLYATEEEKIDILKAFNEGFAKKLARGIDLM 120 

45 Query: 121 AIHGLEPRTMTDASFKATNSFDGVVTGWIKYEADKIDDNIDAAVTTIVANGNDVTGIAL 180 

A+HG+ PRT + TN FD VT V E++ D NI+AAV I + VTG+A+ 
Sbjct: 121 AMHGINPRTKKASDVIGTNHFDSKVTQVVKFTESEDADANIEAAWLIQGSEGVVTGLAM 180 

Query: 181 SPQAGQDMSK-RKDKFDNVMYPEFRFGQRPSNFFNMTLDINKTLTMKGGTAKD-DHAIVG 238 
50 + ++K + MYPE +G P + + +N T+ A+ D I+G 

Sbjct: 181 DTEFSTALAKVTNGEMGPKMYPEIAWGANPDSINGLKSSVNTTVGAGADEAESKDLVIIG 240 

Query: 239 DFQNMFKWGYAENIPMEIIEYGDPDGSGRDLKAYNEILLRTEAFIGWGILDEKAFSRV 296 
DF++MFKWGYA+ IPMEII+YGDPD SG+DLK YN+I LR EA+IGWGILD K+F+RV 
55 Sbjct: 241 DFESMFKWGYAKQIPMEIIKYGDPDNSGKDLKGYNQIYLRAEAYIGWGILDAKSFARV 298 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 447 

60 A DNA sequence (GBSx0484) was identified in S.agalactiae <SEQ ID 1437> which encodes the amino 
acid sequence <SEQ ID 1438>. Analysis of this protein sequence reveals the following: 



40 
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Possible site: 61 

>>> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 2224 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9659> which encodes amino acid sequence <SEQ ID 9660> 
10 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18705 GB-.U38906 ORF30 [Bacteriophage rlt] 
Identities = 64/158 (40%) , Positives = 101/158 (63%) , Gaps = 8/158 (5%) 

15 Query: 43 MSEFKVIETQEELDTIVKARIARERE KYQDYDQLKTRVEELETENSSLQTALNDAK 98 

MSE + +TQEEL+ I++ R+AR++E + DYD+LKT++ LE +N++ Q + ++K 

Sbjct: 1 MSENNLPKTQEELNQIIETRLARQKETIEANFADYDELKTKIAALEADNTAYQATIEESK 60 

Query: 99 SNTDSYTEKITTLENQIAGYEAANLRTKVALQYGLPID1ANRLQGDDEDGLKVDAERLAS 158 
20 S + ++ E QI+GY+ L+ +A++ GLP+DLA+RL GDDE+ LK DAER + 

Sbjct: 61 S WEQEKADYEKQISGYKTTQLKQSIAIKAGLPLDLADRLSGDDEESLKADAERFSG 116 

Query: 159 FIKPSQPQPPTKSNEPIITDQKEAGWIEMARNLVNKGE 196 
FIKP P P K EP + D K+ + ++ L +GE 
25 Sbjct: 117 FIKPKTPPAPLKDVEPNLGDGKDGAYRKLVDGLKTEGE 154 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1439> which encodes the amino acid 
sequence <SEQ ID 1440>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3476 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 128/149 (85%) , Positives = 136/149 (90%) 

40 Query: 43 MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTRVEELETENSSLQTAIiNDAKSNTD 102 

MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTRVEELETENSSLQTALNDAKSNTD 
Sbjct: 1 MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTRVEELETENSSLQTALNDAKSNTD 60 

Query: 103 SYTEKITTLENQIAGYEAANLRTKVALQYGLPIDIANRLQGDDEDGLKVDAERLASFIKP 162 
45 SYTE+I+TL+NQIA YE ANLRTKVALQYGLPIDLA+RLQGDDEDGLKVDAERLASFIKP 

Sbjct: 61 SYTEEISTLKNQIADYETANLRTKA/ALQYGLPIDLADRLQGDDEDGLKVDAERLASFIKP 120 

Query: 163 SQPQPPTKSNEPI ITDQKEAGWIEMARNL 191 
SQPQPP KSNEP I +A + + + L 
50 Sbjct: 121 SQPQPPAKSNEPNIDSNADANYRALVQGL 149 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 448 

55 A DNA sequence (GBSx0485) was identified in S.agalactiae <SEQ ID 1441> which encodes the amino 
acid sequence <SEQ ID 1442>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
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>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2888 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18704 GB:U38906 0RF29 [Bacteriophage rlt] 
Identities = 322/461 (69%) , Positives = 383/461 (82%) 



Query: 


8 


KLGNQRPTQSvNLHFAKTIAHFAINTyKKTGLSCYLWQENMLIPMMAINEDNLWVHQKYG 


67 






+ GNQ PTQSV L F +T EAI Y+K+ CY WQ+N+L +MAI+ED LW HQK+G 




Sb] ct : 


6 


RECjNQYPrQSVlIjPETETKYQEAIEI 


65 


Query: 


68 


YAIPRRNGKTEVWILELWALHKGLKILHTAHRISTSHSSFEKVKKYLEMSGYVDGEDFI 


127 






Y+IPRRNGKTE+VYILELW+L +GL ILHTAHRISTSHSS+EK+KKYLE SGYV+GEDF 




Sb j ct : 


66 


YSIPRRNGKTEIWILELWSLVQGLSILHTAHRISTSHSS 


125 


Query: 


128 


SNKAKGQERIEFKSSGSVIQFRTRTSNGGLGEGFDLLIIDEAQEYTAEQESALKYTVTDS 


187 






S KAKGQER+E SG VIQFRTRTS+GGLGEGFD+L+ IDEAQEYT EQESALKYTVTDS 




Sb j Ct : 


126 


SIKAKGQERLELIESGGVIQFRTRTSSGGLGEGFDILVIDEAQEYTTEQESALKYTVTDS 


185 


Query: 


188 


DNPMT. IMCGTPPTMVSTGTVFESYRKECLKGDRRYSGWAEWSVDEMQPIH^ 


247 






DNPMTIMCGTPPT VS+GTVF +YR + G +YSGWAEWSV++++ IHDV++WY +NP 




Sb j ct : 


186 


DNPMTIMCGTPPTPVSSGTVFTNYRDNTIAGKAKYSGWAEWSVEDVKDIHDVEAVIYNSNP 


245 


Query: 


248 


SMGYHLNERKIEAELGEDEIDHNIQRLGYWPSFNQKSVISEKEWAKLKVEQVPELKSKLF 


307 






SMGYHLNERKIEAELGED++DHN+QRLGYWP +NQKSVISE+EW LKV ++P +K KLF 




Sb j ct : 


246 


SMGYHI^RKIEAELGEDKLDHNVQRIfiYWPKYNQKSVISEQEWNALKVNRLPVIKGKLF 


305 


Query: 


308 


VGI KFGQK3NNVSLS IAARASENKVPVEAIDCLSVRNGTQWI INFLKSADIAKWVDGAS 


367 






VGIK+G DG NV++SIA + KVFVE IDC S+RNG QWIINFLK AD+ KW+DG S 




Sbjct: 


306 


VGIKYGNDGANVAMSIAVKTLSGKVFVETIDCQSIRNGNQWIINFLKKADVEKWIDGQS 


365 


Query: 


368 


GQELIAQEMREHGLKKPELPKVAEIITANTMWEQGIMQETICHNDQPSLTAVVTNCEKRQ 


427 






GQ +L EM++ LK+P LP V EII AN++WEQGI Q+ CH+ QPSL+ WTNC+KR 




Sb j ct : 


366 


GQSILTSEMKDFKLKEPILPTVKEIINANSLWEQGIFQKNFCHSGQPSLSTVVTOCDKRN 


425 


Query: 


428 


IGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQR 468 








IG++GGFGYKS +DD DISLMDSALLAHW C KPK+KQ+ 




Sbj ct : 


426 


IGTSGGFGYKSQFDDMDISLMDSALLAHWACSNNKPKKKQQ 466 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1443> which encodes the amino 
sequence <SEQ ID 1444>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3133 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 437/471 (92%) , Positives = 459/471 (96%) 

Query: 1 MVTKTKAKLGNQRPTQSVNLHFAKTIAHEAINYYKKTGLS 60 

MVTKTK KLGNQRPTQSVNLHFAK+IAHEAINYYKKTGLSCY WQ NMLIP+M&I+E+ L 
Sbjct: 6 MVTKTKTKLGNQRPTQSVNLHFAKSLAHEAINYYKKTGLSCYPWQVNMLIPIMAIDENGL 65 

Query: 61 WVHQKYGYAIPRRNGKTEVOTILELWALHKGLKIIOTAHRISTSHSSFEKVKKYLEMSGY 120 

WVHQKYGYAIPRRNGKTEVVYI++LWALHKGLKILHTAHRISTSH+SFEKVKKYLEMSGY 
Sbjct: 66 WVHQKYGYAIPRRNGKTEVVYIVQLWALHKGLKILHTAHRISTSHASFEKVKKYLEMSGY 125 
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Query: 121 VDGEDFISNKAKGQERIEFKSSGSVIQFRTRTSNGGLGEGFDLLIIDEAQEYTAEQESAL 180 

VDGEDFISNKAKGQERIEFK+SG+VIQFRTRTSNGGLGEGFDLLIIDEAQEYT+EQESAL 
Sbjct: 126 VDGEDFISNK&KGQERIEFKASGAVIQFRTRTSNGGLGEGFDLLIIDEAQEYTSEQESAL 185 

5 Query: 181 KYTVTDSDNPMTIMCGTPPTMVSTGTVFESYRKECIiKGDRRYSGWAEWSVDEMQPIHDVK 240 

KYTVTDSDNPMTIMCGTPPTMVSTGTVFE+YRK+CLKG++RYSGWAEWSV EM I+DV 
Sbjct: 186 KYTVTDSDNPMTIMCGTPPTMVSTGTVFFAYRKDCLKGNKRYSGWAEWSVPEMVKINDVS 245 

Query: 241 SWYVANPSMGYHLNERKIEAELGEDEIDHN1QRLGYWPSFNQKSVISEKEWAKLKVEQVP 300 
1 0 SWY++NPSMG+HLlffiRKIFJVELGEDEIDHNIQRIfiYWPSFNQKSVISEKEWAKLKVEQVP 

Sbjct: 246 SWYISNPSMGFHLNERKIEAELGEDEIDHNIQRLGYWPSFNQKSVISEKEWAKLKVEQVP 305 

Query: 301 ELKSKLFVGIKFGQDGMWSLSIAARASFJJKATFVEAIDCLSVRNGTQWIINFLKSADIAK 360 
ELKSKLFVGIKFGQDGNNVSLSIAAR SENKVFVE IDCLSVRNGTQWI INFLKSADIAK 
15 Sbjct: 306 ELKSKLFVGIKFGQDGNNVSLSIAARTSENKVFVETIDCLSVRNGTQWI INFLKSADIAK 365 

Query: 361 VVVDGASGQELLAQE^EHGLKKPELPKVAEIITANTMWEQGIMQETICHNDQPSLTAW 420 

W+DGASGQELLAQEM++ GLKKPELPKVAEI ITAN MWEQGIMQETICH+DQPSLTAW 
Sbjct: 366 WIDGASGQELLAQEMKDQGLKKPELPKVAEIITAISMMWEQGIMQETICHSDQPSLTAW 425 



20 



Query: 421 TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 471 

TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 
Sbjct: 426 TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 476 



25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 449 

A DNA sequence (GBSx0486) was identified in S.agalactiae <SEQ ID 1445> which encodes the amino 
acid sequence <SEQ ID 1446>. Analysis of this protein sequence reveals the following: 

30 Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2745 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 450 

A DNA sequence (GBSx0487) was identified in S.agalactiae <SEQ ID 1447> which encodes the amino 
acid sequence <SEQ ID 1448>. Analysis of this protein sequence reveals the following: 

45 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0, 2568 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18703 GB:U38906 0RF28 [Bacteriophage rlt] 
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Identities = 124/250 (49%) , Positives = 164/250 (65%) , Gaps = 3/250 (1%) 



Query: 


2 


VDDVLPKLLKSVQQDFEKHFGKSEWAKAFAELQAKKATYKTV^FAvEVGRLLSLALAN 


61 






++D+LP LL+ + QDF++ S+ + ++ L+ KKATY NEF VEVG++LS L 




Sbjct: 


1 


MEDILPPLLEKINQDFDERAANSKKLKQSMELLKTKKATYIQANEFGVEVGQILSDVLGT 


60 


Query: 


62 


SVISDELPDGK>TmriJWLVroTLRHNYKLISDyAGDVQQNmKQAKISLKIQRPPIaNQ 


121 






V D LPDGKMY+NIA+RL+N L+ N+ LIS Y+ DVQ LN+ A LK Q P LNQ 




Sbjct: 


61 


HVTVDVLPDGKMYFNIADRLLNSILKKNFDLISGYSTDVQSEI^QIAGFKLKSQVPEIjNQ 


120 


Query: 


122 


DKIDGLVNRLASEPVFDDVKWLLDEPIVNFSQSIVDDCIRANADFHFKTGLKPTIERIST 


181 






D+IDG+VNR++SE F+ + WLL EPIV FSQS+VDD ++ N DF K GLKP I R 




Sbjct: 


121 


DRIDGIVNRISSEDDFEKILWLLKEPIVTFSQSWDDTLKKNIDFQAKAGLKPKIVRKLV 


180 


Query: 


182 


GKCCDWCDRLAGRYVYHEEPKDFYKRHQHCQCVIDYHPK--NGKRQNSWSKKWTKETTDI 


239 






GK CDWC LAG Y Y P D Y RH+ C+C ++Y P+ + KRQ+ WSK W D 




Sb j ct : 


181 


GKACDWCRNIAGSYDYPNVPSDVYHRHERCRCTVEYDPRDIDKKRQDWSKNWVDPDKDA 


240 


Query: 


240 


-LERRKQMNI 248 








+ RK +N+ 




Sb j ct : 


241 


KIAERKNLNL 250 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1449> which encodes the amino acid 

sequence <SEQ ID 1450>. Analysis of this protein sequence reveals the following: 

25 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3099 (Affirmative) < suco 

30 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 169/261 (64%) , Positives = 207/261 (78%) , Gaps = 2/261 (0%) 

35 



Query: 


1 


MVDDVLPKLLKSVQQDFEKHFGKSEWAKAFAELQAKKATYKTVNEFAVEVGRLLSIALA 


60 






MVDDVLPKLLKSV+QDFEK+FG+S+W KAFAELQAKK TYKTVNEFA+EVGRLLSLAL 




Sb j ct : 


1 


^IVDDvLPKLLKSTOQDFEKYFGESDvWKAFAELQAKKOTYKTvNEFAIEVGRLLSLALT 


60 


Query: 


61 


NSVISDELPDGKMYYNIANRLVNDTLRHNYKLISDYAGDVQQNLNKQAKISLKIQRPPLN 


120 






SV SD+LPDGKMYYNIA RL+++T+ NYKLIS YAGDVQ+ LN+ A+I LK+QRPPLN 




Sb j ct : 


61 


GSVSSDKLPDGKMYYNIAKRLLDETMGRNYKLISGYAGDVQRILNENAQIGLKVQRPPLN 


120 


Query: 


121 


QDKIDGLVNRLASEPVFDDVKWLLDEPIVNFSQSIVDDCIRANADFHFKTGLKPTIERIS 


180 






+DKI+G+VNRL SE FDDVKWL EPI VNFSQSIVDD I+ANAD +KTG+ P + R 




Sbjct: 


121 


RDKINGMVNRLDSENTFDDVKWLFGEPIVNFSQSIVDDTIKANADLQYKTGMTPQVVRTE 


180 


Query: 


181 


TGKCCDWCDRIAGRYVYHEEPKDFYKRHQHCQCVIDYHPKNGKRQNSWSKKWTK--ETTD 


238 






+G CC+WC + G Y Y + PKD ++RHQ C+C +DY PKNGK Q++WSK W K +T + 




Sbjct: 


181 


SGNCCEWCREWGTYSYPKVPKDVWRRHQRCRCTLDYDPKNGKVQSAWSKIWRKKEKTQE 


240 


Query: 


239 


ILERRKQMNIDIRDNNRKSDI 259 








+ER ++ + K+DI 




Sbjct: 


241 


SIERVEKFKESALVESIKNDI 261 





55 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines' or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-552- 

Example 451 

A DNA sequence (GBSx0488) was identified in S.agalactiae <SEQ ID 1451> which encodes the amino 
acid sequence <SEQ ID 1452>. This protein is predicted to be Structural protein. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 58 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 93 - 109 ( 93 - 110) 

Final Results 

10 bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAC39307 GB:AF022773 ORF5 [Lactococcus bacteriophage phi31] 

Identities = 271/410 (66%), Positives = 326/410 (79%), Gaps = 2/410 (0%) 

Query: 1 MNYMGMGYLQRKIALFKTGVBKRYRYYAMDDRDNTRSIVMPDNTOEmRSVIEWTAKGVD 60 
M G+GYL+ KL++ K + RY YAM D + I +P + + YRS++ W AKGVD 
20 Sbjct: 1 MTEKGIGYLRFKLSVHKRRAEmYEQYAMKHVDRFKGITIPQALSQQYRSILGWCAKGVD 60 

Query: 61 SLADRIIFREFANDDFNAWEIFKANNPDIFFDTAIQSALIASCCFVYIMPGKEDSLPKMQ 120 

SLADR+IFREF NDDF EIF+ NNPDIFFD+A+ SALIASC F+YI G+ D++ ++Q 
Sbjct: 61 SliADRLIFREFENDDFTVNEIFEENNPDIFFDSAVLSALIASCSFIYISKGENDAV-RLQ 119 

25 

Query: 121 VIEASKATGILDPTTFLLTEGYAVLESDSNENPTLEAYFTGEKTim'PKDEKP-YSIDNS 179 

VIEA ATGI+DP T LLTEGYAVLE D N N LEA+F ++T YY +D + SI N 
Sbjct: 120 VIFAVNATGIIDPITGLLTEGYAVLERDENNNVVLEAHFLPDRTDYYYRDSRNNISIANP 179 

30 Query: 180 TGHPLLVPVIHRPDAWPFGRSRITKAGMYHQKAAKRTLERAEVTAEFYSFPQKYVLGMD 239 

TGHPLLVP+ IHRPDAVRPFGRSRIT++GMY Q AKRTLERA+VTAEFYS FPQKYV G+ 
Sbjct: 180 TGHPLLVPIIHRPDAWPFGRSRITRSGMYWQSNAKRTLERADVTAEFYSFPQKYVTGLS 239 

Query: 240 PDAEPMEKWRATVSTLLEISKDEDGDKPTVGQFTTASMAPFMDHLKMYASLFAGGSGLTL 299 
35 DAEPME W+ATVS++L+ +KDEDGDKPT+GQFT SM+PF + L+ A+ FAG +GLTL 

Sbjct: 240 DDAEPMETWKATVSSMLQFTKDEDGDKPTLGQFTQPSMSPFTEQLRTAAAGFAGETGLTL 299 

Query: 300 DDLGFPSDNPSSVEAIKAAHENLRAAGRKAQRSFSSGFLNVAYIAVCLRDDFPYLRNQFM 359 
DDLGF SDNPSSVEAIKA+HENLR AGRKAQRS +G LNVAY+A CLRDD PYLR QF 
40 Sbjct: 300 DDLGFVSDNPSSVEAIKASHENLRLAGRKAQRSLGAGLLNVAYLAACLRDDVPYLREQFS 359 

Query: 360 DTEIKWEPLFEADANMLTLVGDGAIKLNQAIPGFMDADVIRDLTGVKGSD 409 

T+ KWEPLFEADA+ML+L+GDGAIKLNQAIP F++ D IRDLTG+KG++ 
Sbjct: 360 KTKPKWEPLFEADASMLSLIGDGAIKLNQAIPEFINKDTIRDLTGIKGAE 409 

45 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1453> which encodes the amino acid 
sequence <SEQ ID 1454>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -1.38 Transmembrane 93 - 109 ( 93 - 110) 

Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 395/422 (93%) , Positives = 407/422 (95%) 

60 Query: 1 MNYMGMGYLQRKLALFKTGvDKRYRYYAMDDRDNTO^ 60 

I€mIGMGYL+RKLALFKTGvDKEYRYYAMDDRD+TRSIVMP+NTOE^ra^SV+EWTAKGVD 
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Sbjct: 1 MNYMGMGYLRRKIALFKTGVDKRYRYYAMD^^ 60 

Query: 61 SLADRIIFREFANDDFNAWEIFKANNPDIFFDTAIQSALIASCCFVYIMPGKEDSLPKMQ 120 

SLADRIIFREF NDDFNAWEIFKANNPDIFFDTAIQSALIASCCFVYIMPG ED LPKMQ 
Sbjct: 61 SLADRIIFREFTNDDFNAWEIFKANNPDIFFDTAIQSALIASCCFVYIMPGAEDGLPKMQ 120 

Query: 121 VIEASKATGILDPTTFLLTEGYAVLESDSNENPTLEAYFTGEKTWYYPKDEKPYSIDNST 180 

VIEASKATGILDPTTFLLTEGYA+LESDSN NPTLEAYFT + WYYPK KPY+I N T 
Sbjct: 121 VIEASKATGILDPTTFLLTEGYAILESDSNGNPTLEAYFTDKDIWYYPKKGKPYNIKNPT 180 

Query: 181 GHPLLVPVIHRPDAVRPFGRSRITKAG^^HQKAAKRTLERAEVTAEFYSFPQKYVLGMDP 240 

GHPLLVP+IHRPDAWPFGRSRITKAGMYHQKAAKRTLERAEVTAEFYSFPQKYVLGMDP 
Sbjct: 181 GHPLLVPIIHRPDAVRPFGRSRITKAGMYHQKAAKRTLERAEVTAEFYSFPQKYVLGMDP 240 

15 Query: 241 DAEPMEKWRATVSTLLEISKDEDGDKPTVGQFTTASMAPFMDHLKMYASLFAGGSGLTLD 300 

DAEPMEKWRATVSTLLEISKDEDGDKPTVGQFTTASMAPFM+HLKMYASLFAGGSGLTLD 
Sbjct: 241 DAEPMEKWRATVSTLLEISKDEDGDKPTVGQFTTASMAPFMEHLKMYASLFAGGSGLTLD 300 

Query: 301 DLGFPSDNPSSVEAIKAAHENLRAAGRKAQRSFSSGFIJSIVAYIAVCLRDDFPYLRNQFMD 360 
20 DLGFPSDNPSSVE+IKAAHENLRAAGRKAQRSFSSGFLNVAYIAVCLRD+FPYLRNQFMD 

Sbjct: 301 DLGFPSDNPSSVESIJCAAHENLRAAGRKAQRSFSSGFLNVAYIAVCLRDEFPYLRNQFMD 360 

Query: 361 TE I KWEPLFEADANMLTLVGDGAI KLNQAI PGFMDADVI RDLTGVKGSDNPI PKATEVTT 420 
T I KWEPLFEADANMLTLVGDGAI KLNQAI PGFMDADVIRDLTGVKG+D PIP TEVTT 
25 Sbjct: 361 TVI KWEPLFEADANMLTLVGDGAI KLNQAI PGFMDADVIRDLTGVKGADKP I PAITEVTT 420 

Query: 421 DG 422 
DG 

Sbjct: 421 DG 422 

30 

SEQ ID 1452 (GBS364) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 6; MW 50kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 11; MW 75kDa). 

GBS364-GST was purified as shown in Figure 216, lane 10. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 452 

A DNA sequence (GBSx0489) was identified in S.agalactiae <SEQ ID 1455> which encodes the amino 
acid sequence <SEQ ID 1456>. Analysis of this protein sequence reveals the following: 

40 Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4 063 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1457> which encodes the amino acid 
50 sequence <SEQ ID 1458>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 4120 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 101/118 (85%) , Positives = 110/118 (92%) 

5 

Query: 1 MKKKCLICKKTFQAKTNRSLYCSEECRKKGIREKQRKLMKQKRADKKKEKIKVIjNITJADV 50 

+KKKCLICKK FQAKTNR+LYCSEECRKKG REKQRKLMKQKRA+++KEK KVLN N DV 
Sbjct: 1 LKKKCLICKKNFQAKTI^TLYCSEECRKTCGNREKQRKLMKQKRAEQRKEKKKVIiNPOT^ 60 

10 Query: 61 TEKPKKIRNLVQHYKKLKREILDNESEFGFTGIALVEGIDIHEENFVDLVMQKIKEQQ 118 

TEKPKKIRNL QHYKKLK+EIL NESEFGFTGI L+EGID+HEENFVDLVMQKI KEQ+ 
Sbjct: 61 TEKPKKIRNLAQHYKKLKKE I LANESEFGFTGITLIEGI DVHEENFVDL VMQKI KEQK 118 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 453 

A DNA sequence (GBSxO490) was identified in S.agalactiae <SEQ ID 1459> which encodes the amino 

acid sequence <SEQ ID 1460>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0633 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC39305 GB:AF022773 0RF3 [Lactococcus bacteriophage phi31] 
Identities = 75/109 (68%) , Positives = 87/109 (79%) , Gaps = 1/109 (0%) 

30 

Query: 29 LRADKKGTHRVAFEKNKRRXiLKTAHLCGICGRPVDKSIjKYPHPLSAAIDHIVPIAKGGHP 88 

LRAD+ G HRVAF+KN++ LLKT + CGICG+P+DK LK P PLS +DHI+PI KGGHP 
Sbjct: 3 LRADRTGAHRVAFDKNRKILLKTQNTCGICGKPIDKRLKAPDPLSPWDHIIPINKGGHP 62 

35 Query: 89 SSIDNLQLTHWQCNRQKSDKLFINQTAVRATWGNRNLPQSRDWSSYAS 137 

S++DNLQL HW CNRQKSDKLF N V+GNRNLPQSRDWSSY S 

Sbjct: 63 SA^TONLQLAHWTCNRQKSDKIjF-NVKQEEPKVLGNRNLPQSRDWSSYVS 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1461> which encodes the amino acid 
40 sequence <SEQ ID 1462>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certaxnty=0 .4185 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

50 Identities = 88/112 (78%) , Positives = 102/112 (90%) 

Query: 28 KLRADKKGTHRVAFEKNKRRLLKTAHLCGICGRPVDKSLKYPHPLSAAIDHIVPIAKGGH 87 

+LRADKKGTHRVAF++NK++LLK A +CGICG+PVDKSLKYPHPLSAAIDHIVPIAKGGH 
Sbjct: 3 QLRADKKGTHRVAFDRNKKKLLKAATVCGICGKPvDKSLKYPHPLSAAIDHIVPIAKGGH 62 



55 



Query: 88 PSSIDNLQLTHWQCNRQKSDKLFINQTAVRATWGNRNLPQSRDWSSYASKE 139 

PS + ++NLQLTHWQCNRQKSDKLF NQ + +GNRNLPQSRDWSS+A K+ 

Sbjct: 63 PSALENLQLTHWQCNRQKSDKLFANQASNEPKTIGNRNLPQSRDWSSFAFKK 114 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 454 

5 A DNA sequence (GBSx0491) was identified in S.agalactiae <SEQ ID 1463> which encodes the amino 
acid sequence <SEQ ID 1464>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .4481 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 455 

A DNA sequence (GBSx0492) was identified in S.agalactiae <SEQ ID 1465> which encodes the amino 
acid sequence <SEQ ID 1466>. Analysis of this protein sequence reveals the following: 

20 Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2907 (Affirmative) < suco 

25 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF43508 GB:AF145054 0RF15 [Streptococcus thermophilus 
30 bacteriophage 7201] 

Identities = 61/187 (32%) , Positives = 90/187 (47%) , Gaps = 31/187 (16%) 

Query: 1 MNIEEAKKL1DKQSIGKGGVGDIPVVKTHIVKVLLDQIDQPQPEVPRFVADWYEKHKDSL 60 
MN +EA KIK+ + +LDI+P VP++VADWYE+HKD 

35 Sbjct: 1 MNRDEAVKKIAKEGY ISIEHAEDLYDSI1T-KPWPQYVADWYEEHKDEF 49 



40 



Query: 61 ECDL YLYHMSIY--DEEVEKDDFYYWMQTSKNPVYTLINMHQFGYTIQKEKLYT 112 

+L + H++ Y +E DF W +KN + L+NMHQFGY ++KEK YT 

Sbjct: 50 YI^HRVVRDFFEHIiNAYYFNENPlDYDFACmYNTKNAlQILVNMHQFGYEVKKEKRYT 109 

Query: 113 VEIPN--PNERQLSFVLMRQLSGNVSIKVMHRDNLDLLKTDNDLQLTESEIRKDFDWAWQ 170 

V I N E L++ R+ + RDN D +T + T E+ ++ + W 
Sbjct: 110 VRIRNLDDEETYLNYDKFRE TWVFYSRDNTDRFRTIH THKEL - EEGGFGWV 159 

45 Query: 171 FREEWE 177 

F E +E 
Sbjct: 160 FDCEGIE 166 

A related GBS nucleic acid sequence <SEQ ID 10927> which encodes amino acid sequence <SEQ ID 
50 10928> was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1467> which encodes the amino acid 
sequence <SEQ ID 1468>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
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>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3815 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 70/180 (38%) , Positives = 98/180 (53%) , Gaps = 30/180 (16%) 

Query: 1 MNIEEAKKLIDKQSI-GKGGVGDIPVWTHIV1CVI.LDQIDQPQPEVPRFVADWYEKHKDS 59 

MNIEEAK+L+D GK V+K V+ ++DQ++QP+PEVP+ VADW E+ K+ 

Sbjct: 1 MNIEEAKELVDNSKFYGKTS SVIKAE- VRDIIDQLNQPKPEVPQCVADWIEECKEE 55 

15 Query: 60 LECDI.yLYHMSiyDEEVEKDDFYyWMQTSKNPvYTLINMHQFGYTIQKEKLYTVEIPN-- 117 

DL L ++ + W+ S + GYT++KEKLYTV++PN 

Sbjct: 56 ---DLTL--KGLFSNSDMPAKIFDWIFGSDENCRLMAEAWINGYTVEKEKLYTVDLPNGQ 110 

Query: 118 PNERQLS FVLMRQLSGNVS I KVMHRDNLDLLKTDNDLQLTESE IRKDFDWAWQFREE WE 177 
20 P R ++ + Q L T+N ++LTESEIRKDF+WAWQF EEV E 

Sbjct: 111 PLVRGINTLYFSQN LATEN-VKLTESEIRKDFEWAWQFAEEVTE 153 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 456 

A DNA sequence (GBSx0493) was identified in S.agalactiae <SEQ ID 1469> which encodes the amino 
acid sequence <SEQ ID 1470>. Analysis of this protein sequence reveals the following: 



Possible site: 46 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5365 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 457 

A DNA sequence (GBSx0494) was identified in S.agalactiae <SEQ ID 1471> which encodes the amino 
acid sequence <SEQ ID 1472>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -8.55 Transmembrane 34 - 50 ( 31 - 54) 

Final Results 

bacterial membrane Certainty=0. 4418 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9657> which encodes amino acid sequence <SEQ ID 9658> 
was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1473> which encodes the amino acid 
sequence <SEQ ID 1474>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.25 Transmembrane 26 - 42 ( 20 - 49) 

Final Results 

bacterial membrane Certainty=0 . 5501 (Affirmative) < suco 

10 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 56/89 (62%) , Positives = 71/89 (78%) 

Query: 8 MTEQQMIDCLLYELAKKDIG^IRRlilNIITFLSIVLMAISIIiNVALQDHyKSQITELRTQL 67 

MTE+QMIDCLLYEL KKDK +++ II L+++L+ +S L V+L+ +Y+ QI LRTQL 
Sbjct: 1 MTEEQMIDCLLYELVKKDKAIKKKSI I IAALTVMLIWSGLCVSLKSYYEPQIYGLRTQL 60 

20 

Query: 68 SRTQKQLKRASDDRARQTKRIAELTGNGG 96 

SRTQKQLKRAS+ RQTKRIA+LT NGG 
Sbjct: 61 SRTQKQLKRASEQNQRQTKRIADLTNNGG 89 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 458 

A DNA sequence (GBSx0495) was identified in S.agalactiae <SEQ ID 1475> which encodes the amino 
acid sequence <SEQ ID 1476>. Analysis of this protein sequence reveals the following: 

30 Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2040 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 459 

A DNA sequence (GBSx0496) was identified in S.agalactiae <SEQ ID 1477> which encodes the amino 
acid sequence <SEQ ID 1478>. Analysis of this protein sequence reveals the following: 

45 possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3044 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD37108 GB:AP109874 unknown [Bacteriophage Tuc2009] 
Identities = 50/143 (34%) , Positives = 67/143 (45%) , Gaps = 29/143 (20%) 

Query: 1 MI PNFRAFNKETKKM - YG - VDGFELSTOKI YRCSIADDEFRCGRLETFHFVEDNFDDYI L 58 

MIP RA++K+ ++M YG V+ F+ S+ YR HF +D 

Sbjct: 1 MIPKLRAWDKQDERMSYGEVEYFDDSIN- -YRFD HFCTGADEDVEF 44 , 

Query: 59 MQSTGMFDKNGVEI FDGDI VLTTRL IDY-TYKNFKGWKMLEGRWLIDTGKDA 110 

MQSTG+ DKNGVEI++GDI+ + IYY G+EGL + 

Sbjct: 45 MQSTGIKDKNGVEIYEGDILKLHAIFLAPDDKIGYLEYSPKYGYSIICEGNRLY---RQE 101 

15 Query: 111 VGLWTEVDENEAIGNIYQNSELL 133 

T E IGNIY+N ELL 

Sbjct: 102 YWASTNKLNYEVIGNIYENPELL 124 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1479> which encodes the amino acid 
20 sequence <SEQ ID 1480>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 4779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

30 Identities = 44/52 (84%) , Positives = 47/52 (89%) 

Query: 1 MIPNFRAFNKETKKMYGVDGFELSVRKIYRCSLADDEFRCGRLETFHFVEDN 52 

MIPNFR FNK+TKKMY +DGF+ S RKIYRCSLADDEFR GRLETFHFVEDN 
Sbjct: 1 MIPNFRGFNKKTKKMYSIDGFKSSERKIYRCSLADDEFRSGRLETFHFVEDN 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 460 

A DNA sequence (GBSx0497) was identified in S.agalactiae <SEQ ID 1481> which encodes the amino 
40 acid sequence <SEQ ID 1482>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 3843 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9655> which encodes amino acid sequence <SEQ ID 9656> 
50 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 



35 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 461 

A DNA sequence (GBSx0498) was identified in S.agalactiae <SEQ ID 1483> which encodes the amino 
acid sequence <SEQ ID 1484>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5189 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9653> which encodes amino acid sequence <SEQ ID 9654> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF43503 GB:AF145054 ORF10 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 92/147 (62%) , Positives = 121/147 (81%) 

Query: 15 IEPKPQTRPKFSKFGTYEDPKMKRWRKEVSGWIEKNYIX3PFFDDCIKVEVTFYMKAPKTL 74 

IEPKPQTRP+FSKFGTYEDPKMK WR+E S IE+ YDG FF I V+VTFYMKAP ++ 
Sbjct: 7 IEPKPQTRPRFSKFGTYEDPKMKAWRRECSRLIEQEYDGQFFYGPISVDvTFYMKAPLSV 66 

Query: 75 SKEPTQRSKGKTIQIYQNFVRELIWHAKKPDIDNLIKAVFDSISDAGYDRIQKSGIVWSD 134 

SK+PT +++ KT ++ F+ E +WH++KPDIDNLIKA+FDSIS AGY+++ K GIVW+D 
Sbjct: 67 SKKPTPKARAKTWDAFKKFMAERLWHSRKPDIDNLIKALFDSISTAGYNKVDKKGIVWTD 126 

Query: 135 DNIVCDLRAKKKYSQNPRI KVRIEEID 161 

D+IVC L A+K+YS+NPRI+ I+E++ 
Sbjct: 127 DSIVCKLSAQKRYSENPRIEFEIKELE 153 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 462 

A DNA sequence (GBSx0499) was identified in S.agalactiae <SEQ ID 1485> which encodes the amino 
acid sequence <SEQ ID 1486>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O .4007 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 463 

A DNA sequence (GBSx0500) was identified in S.agalactiae <SEQ ID 1487> which encodes the amino 
acid sequence <SEQ ID 1488>. This protein is predicted to be pXOl-07. Analysis of this protein sequence 
reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3664 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC38715 GB:AF030367 maturase-related protein [Streptococcus pneumoniae] 
Identities = 146/373 (39%) , Positives = 216/373 (57%) , Gaps = 18/373 (4%) 



Query: 


35 


LYDKVYRKDILKVAWFYVKRNKGSAGIDDFTIEEIEAYGVQKFLDEIEDQLRNKKYQPKA 


94 






L DK+ ++ + A+ VK NKGSAGID TIEE++ Y Q + ++ ++ +KY+P+ 




Sb j ct : 


4 


LLDK1LSRENMLEAYNQVKSNKGSAGIDGMTIEEMDNYLRQNWR-LTKELIKQRKYKPQP 


62 


Query: 


95 


VKRVYIPKANGKKRPLGIPTVRDRWQTAVKIVIEPIFEADFQEFSYGFRPKRSANQAIR 


154 






V +V IPK +G R LGIPTV DR++Q A+ V+ PI E F + SYGFRP RS +AI 




Sbjct: 


63 


VIKOTIPKPDGGIRQLGIPTVMDRMIQQAIVQVMSPICEPHFSDTSYGFRPNRSCEKAIM 


122 


Query: 


155 


EIYKYLNYGCEWVIDADLKGYFDTIPHDKLLLLVKERVTDKSIIKLLSLWLEAGIMEDNQ 


214 






++ +YLN G EW++D DL+ +FDT+P D+L+ LV + D L+ +L +G++ + Q 




Sb j ct : 


123 


KLLEYIM3GYEWIVI3IDLEKFFDTVPQDRLMSLVHNIIEDGDTESLIRKYLHSGVIINGQ 


182 


Query: 


215 


WSNILGTPQGGVISPIjLANIYIjNALDRYWKNNRIiEGRGHDAHIiIRYADDFVI-LCSNNP 


273 






++GTPQGG +SPLL+NI LN LD+ LE RG +RYADD VI + S 




Sbj ct : 


183 


RYKTLVGTPQGGNLSPLLSNIMLNELDK ELEKRG- -LRFVRYADDCVITVGSEAA 


235 


Query: 


274 


KKYYQYAKQRI - -DKLGLTLNEEKTRIVHATEGFDFLGYTLRKSKSHKSGKYKTYYYPSR 


331 






K Y+ R +LGL +N KT+I E +LG+ KS + P + 




Sbj ct : 


236 


AKRVMYSVSRFIEKRLGLKVNMTKTKITRPRE-LKYLGFGFWKSSDGWKSR PHQ 


288 


Query: 


332 


KSMKSIKGKV103VIQTGQHLNLPDVMERLNPMLRGWANYFKAGNSKQHFKSIDNYVIYNL 


391 






S++ K K+K + Q ++L +E+LN +RGW NYF GN K SID + L 




Sbj ct : 


289 


DSVRRFKLKLKKLTQRKWSIDLTRRIEQLNLSIRGWINYFSLGNMKSIVASIDERLRTRL 


348 


Query: 


392 


TIMLRKKHKKSGK 404 








+++ K+ KK + 




Sbj ct : 


349 


RMI I WKQWKKKSR 361 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 464 

A DNA sequence (GBSx0501) was identified in S.agalactiae <SEQ ID 1489> which encodes the amino 
acid sequence <SEQ ID 1490>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3833 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9651> which encodes amino acid sequence <SEQ ID 9652> 
was also identified. 

A further related DNA sequence (GBSx2517) was identified in S.agalactiae <SEQ ID 7217> which 
encodes the amino acid sequence <SEQ ID 7218>. Analysis of this protein sequence reveals the following: 

5 Possible site: 27 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3833 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1491> which encodes the amino acid 
sequence <SEQ ID 1492>. Analysis of this protein sequence reveals the following: 

15 Possible site: 27 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2299 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



25 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 113/163 (69%) , Positives = 128/163 (78%) , Gaps = 25/163 (15%) 

Query: 1 MINNIVLVGRMTKDAELRYTPSNQAVATFSLAVNRNFKNQSGEREADFINCVIWRQQAEN 60 

MINN+VLVGRMTKDAELRYTPS AVATF+LAVNR FK+Q+GEREADFINCVTWRQ AEN 
Sbjct: 1 MINNVVLVGRMTKBAELRYTPSQVAVATFra 60 

30 Query: 61 LANWAKKGALVGITGRIQTRNYENQQGQRIYVTEWAENFQLLESRNSQQ Q 111 

LANWAKKGAL+G+TGRIQTRNYENQQGQR+YVTEWA+NFQ+LESR +++ 
Sbjct: 61 lANWAKKGALIGVTGRIQTRNyENQQGQRVYVTEVVADNFQMLESRATREGGSTGSFNGG 120 

Query: 112 TNQSGNSSNSY FGNANKMD I SDDDLPF 138 

35 N + +SSNSY FGN+N MDI SDDDLPF 

Sbjct: 121 FNNNTSSSNSYSAPAQQTPNFGRDDSPFGNSNPMDISDDDLPF 163 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 465 

A DNA sequence (GBSx0502) was identified in S.agalactiae <SEQ ID 1493> which encodes the amino 
acid sequence <SEQ ID 1494>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 
45 INTEGRAL Likelihood = -1.33 Transmembrane 17 - 33 ( 17 - 33) 

Final Results 

bacterial membrane Certainty=0. 1532 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 466 

A DNA sequence (GBSx0503) was identified in S.agalactiae <SEQ ID 1495> which encodes the amino 
5 acid sequence <SEQ ID 1496>. This protein is predicted to be p22 erf-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

10 Final Results -t 

bacterial cytoplasm Certainty=0 .2469 (Af firmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA97824 GB:AB044554 orf 17 [Staphylococcus aureus prophage 
phiPV83] 

Identities = 93/183 (50%), Positives = 120/183 (64%), Gaps = 5/183 (2%) 

20 Query: 1 MRKSESITEYAKAFCKAQLEVKQPLKDKDNPFFKSKYVPLENVTEAITTAFANNGISFSQ 60 

M KSE++ E KA + EVKQPLKDK+NPFFKSKYVPLENV EAI A +G+S++Q 
Sbjct: 1 MNKSETVVEINKAMVAFRKEVKQPLKDKNNPFFKSKOTPLENVVEAIDEAATPHGLSYTQ 60 

Query: 61 DPTTNTENGYIDVATLVMHTSGEWVEYGPLSVKPTKNDVQGAGSAITYAKRYALSAIFGI 120 
25 N +G + VAT++MH SGE++EY P+ + KN QGAGS I+Y KRY+LSAIFGI 

Sbjct: 61 W-ALNDVDGRVGVATMLMHESGEYIEYDPVFWJAEKNTPQGAGSLISYLKRYSLSAIFGI 119 

Query: 121 TSDQDDDGNEDSkPNNSRQSPKATTKKTQKTGYQTPKISNIQIETYKSDLNDIAKATNQN 180 
TSDQDDDGNE S NN +PK T +TQ +T I ++ ++ + K QN 
30 Sbjct: 120 TSDQDDDGNEASGKNN NPKQQT-RTQWASSETIGILRKEVISFTKLIKGTDKEAPQN 175 

Query: 181 VEE 183 
+ E 

Sbjct: 176 IVE 178 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that tins protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 467 

40 A DNA sequence (GBSx0504) was identified in S.agalactiae <SEQ ID 1497> which encodes the amino 
acid sequence <SEQ ID 1498>. This protein is predicted to be gpl57. Analysis of this protein sequence 
reveals the following: 



45 



50 



Possible site: 55 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3148 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD44102 GB:AF115103 orf 157 gp [Streptococcus thermophilus 
bacteriophage Sfi21] 



WO 02/34771 



-563- 



PCT/GB01/04789 



10 



Identities = 59/160 (36%) , Positives = 100/160 (61%) , Gaps = 3/160 (1%) 

Query: 1 MAYLYELEGIYAQLQSMDLDEETFQDTLDSIDFQSDLENNIEYFVKMLKNVQADAEKyKA 60 

MA LYEL G + ++ +M++D+ET DTL++ID+ SD EN +E +VK++K+++AD E K 
Sbjct: 1 ^TLYELTGQFLEIYNMEIDDETKLDTLEAIDVWSDYENKvEGWKYIKSLEADIEARKN 60 

Query: 61 EKEAFYKKQKQAEAKAEKYKETIRLAI^LSQKKKVDAGMFKVSLRRSKKVEILDETKIPL 120 

EK+ K ++K +K K + ++M + + +VD +FK+ +SK V +++E K+P 

Sbjct: 61 EKKRLDGLNKSDQSKIDl^KA&I^ISMTETGQTRVOT 119 

Query: 121 DYMQEKIEYKPMKAEISKALKSGIDISGVELIETESLQVK 160 

+Y + YKP K + + LKSG I G L E +L ++ 
Sbjct: 120 EY- -QIATYKPDKKTLKELLKSGKHIEGATLEERRNIiNIR 157 



15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 468 

A DNA sequence (GBSx0505) was identified in S.agalactiae <SEQ ID 1499> which encodes the amino 
20 acid sequence <SEQ ID 1500>. This protein is predicted to be tropomyosin 2. Analysis of this protein 
sequence reveals the following: 

Possible site: 26 

»> Seems to have no N- terminal signal sequence 

25 Final Results 

bacterial cytoplasm — Certainty=0. 4474 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 469 

35 A DNA sequence (GBSx0506) was identified in S.agalactiae <SEQ ID 1501> which encodes the amino 
acid sequence <SEQ ID 1502>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4114 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

45 A related GBS nucleic acid sequence <SEQ ID 9649> which encodes amino acid sequence <SEQ ID 9650> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 470 

A DNA sequence (GBSx0507) was identified in S.agalactiae <SEQ ID 1503> which encodes the amino 
acid sequence <SEQ ID 1504>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3799 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1505> which encodes the amino acid 
sequence <SEQ ID 1506>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3775 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 43/46 (93%) , Positives = 46/46 (99%) 

Query: 1 MTKQHRETLIWYRASHQEREKLLDFGLVDKSQYVTLLRQLRKKYAI 46 

MTKQHRETLIWYRASHQERE+LLDFGLvTJK++yvTIiLRQLRKKYAI 
Sbjct: 1 MTKQHRETLIWYRASHQERERLLDFGLVDKARYVTLLRQLRKKYAI 46 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 471 

A DNA sequence (GBSx0508) was identified in S.agalactiae <SEQ ID 1507> which encodes the amino 
acid sequence <SEQ ID 1508>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4308 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1509> which encodes the amino acid 
sequence <SEQ ID 1510>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0 . 4308 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 76/77 (98%) , Positives = 76/77 (98%) 

Query: 1 MDQEIFNFFNKQIKKDFGKTASKETFAKFASYCAEGIEKNGVKPIFNWINLYAFGTGMTT 60 
MDQEIFNFFNKQIKKDFGKTASKETFAKFASYCAEGIEKNGVKPIFNWINLYAFGTGMTT 
10 Sbjct: 1 MDQEIFNFFNKQIKKDFGKTASKETFAKFASYCAEGIEKNGVKPIFNWINLYAFGTGMTT 60 

Query: 61 AEADRLRIERYKQENTL 77 

AEADRLRIERYKQEN L 
Sbjct: 61 AEADRLRIERYKQENAL 77 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 472 

A DNA sequence (GBSx0509) was identified in S.agalactiae <SEQ ID 1511> which encodes the amino 
20 acid sequence <SEQ ID 1512>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 2706 (Affirmative) < suco 

bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1513> which encodes the amino acid 
30 sequence <SEQ ID 1514>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 3316 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

40 Identities = 52/127 (40%) , Positives = 75/127 (58%) , Gaps = 1/127 (0%) 

Query: 160 EDRFVDVVEANLGRGLVKFEFDMINDYLIGQNVSKDLFLFAvTCVAVANNVRKFNYIARIL 219 

E + + + GR + FE + I ++ N+ ++ A++ AV NN + YI +IL 
Sbjct: 3 EKKLFENFQLTFGRMISPFEIEDIQKWIHEDNMPIEVVNLALREAVENNKISWKYINKIL 62 

45 

Query: 220 DNWINDGIKTPEQAYQAQRDFKAKKANKIMQSQSNVPSWSNPDYKGPDLKEFALGSIDDI 279 

+W G T E+ + F K +++ + SNVPSWSNPDYK PDL+EFALGS+D I 

Sbjct: 63 VDWYKSGDTTVEKVRDRLQRFDDSKKQRSVTT-SNVPSWSNPDYKEPDLEEFALGSMDGI 121 

50 Query: 280 EDGSGDF 286 

EDGSGDF 
Sbjct: 122 EDGSGDF 128 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 473 

A DNA sequence (GBSx0510) was identified in S.agalactiae <SEQ ID 1515> which encodes the amino 
acid sequence <SEQ ID 1516>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
5 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.63 Transmembrane 13 - 29 { 11 - 31) 

Final Results 

bacterial membrane Certainty=0. 3251 (Affirmative) < suco 

10 bacterial outside Certainty=0. 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9647> which encodes amino acid sequence <SEQ ID 9648> 
was also identified. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 474 

20 A DNA sequence (GBSx0511) was identified in S.agalactiae <SEQ ID 1517> which encodes the amino 
acid sequence <SEQ ID 151 8>. Analysis of this protein sequence reveals the following: 
Possible site: 34 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 5822 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 475 

35 A DNA sequence (GBSx0512) was identified in S.agalactiae <SEQ ID 1519> which encodes the amino 
acid sequence <SEQ ID 1520>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4175 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 476 

A DNA sequence (GBSx0513) was identified in S.agalactiae <SEQ ID 1521> which encodes the amino 
5 acid sequence <SEQ ID 1522>. This protein is predicted to be Pl-antirepressor homolog. Analysis of this 
protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3411 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9645> which encodes amino acid sequence <SEQ ID 9646> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG31333 GB:AF182207 ORF 169a [Bacteriophage mv4] 
Identities = 88/167 (52%) , Positives = 122/167 (72%) 

20 

Query: 100 MLQRNEKSKQTOKYFIQVEKDFNSPEKIMARALLMADKKITNLTMENNQLQLDLKEAQKQ 159 

M+ + K K++R+YFIQVEK++NSPE 1+ RAL +++ +1 L +N L L L+E+ K+ 
Sbjct: 1 MMSKTAKGKEIRQYFIQVEKNWNSPEMIIQRALEISNARIQELQAQNKSLTLQLEESNKK 60 

25 Query: 160 ARYLDLIIESKGALRWQIAADYGMSVNKFNRTLLEFGVQHKVNGQWILYKRHMGKGYTD 219 

A YLD+I+ + L TQIAADYG S FN+ L E G+QHKVNGQWILYK +MGKGY 
Sbjct: 61 ASYLDIILGTPDLLATTQIAADYGYSARTFNQLLKEVGIQHKVNGQWILYKAYMGKGYVQ 120 

Query: 220 SHTFDYQDKNGHTRANVTTTWTQKGRLFLYELLKDNNILPLIEQEDI 266 
30 S +F ++D+ GH R+ +T WTQKGR +Y++LK+N LPLIE++DI 

Sbjct: 121 SKSFAFKDRKGHDRSKPSTYWTQKGRKLIYDVLKENGTLPLIERDDI 167 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1523> which encodes the amino acid 
sequence <SEQ ID 1524>. Analysis of this protein sequence reveals the following: 

35 Possible site: 19 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4214 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/249 (52%), Positives = 163/249 (65%), Gaps = 14/249 (5%) 

45 

Query: 19 MNQLINITLNENQEPWSGRDLHNvXNIKTQYTKWLERMSEYGFEENVDYIAISQKRLTA 78 

MNQLIN+TLNENQEPWSGRDLH VL IKTQYTKWLERMSEYGF EN D++AI SQKRLTA 
Sbjct: 1 ITOQLINVTLNENQEPWSGRDLHKVLEIKTQYTKWLERMSEYGFVENEDFMAISQKRLTA 60 

50 Query: 79 Q^NRTEYIDHVLKLDMAKEIAMLQRNEKSKQTOKYFIQVEKDFNSPEKIMARALLMADKK 138 

QGN+TEY DHVLKLDMAKEIAMLQRNEKSK+WKYFIQVEKDFNSPEKIMARALLMADKK 
Sbjct: 61 QGNQTEYTDHVLKLDMAKEIAMLQRNEKSKEVRKYFIQVEKDFNSPEKIMARALLMADKK 120 

Query: 139 ITNLTMENNQLQLDLKEAQKQARYLDLIIESKGALRVTQIAA DYGMSVNKFNKTL 193 

55 + ++L+ ++ + + + D + S ++ V ++A + + L 

Sbjct: 121 V HKLEAQIEADRPKVLFADAVSASHTSILvGELAKiLKQNGVNIGATRLFTWL 173 
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Query: 194 LEFGVQHKVNGQ - WI L - YKRHMGKGYTDSHTFDYQDKNGHTRANVTTTWTQKGRLFLYfit 251. 

+ G K NG+ W + ++ + G +GH + T T KG+ + 

Sbjct: 174 RKHGYLIKRNGRDWNMPTQKSVELGLIRVKETSITHSDGHITVSKTPLVTGKGQQYFINK 233 

5 Query: 252 LKDNNILPL 260 

+ LP+ 
Sbjct: 234 FLNQEYLPV 242 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 477 

A DNA sequence (GBSx0514) was identified in S.agalactiae <SEQ ID 1525> which encodes the amino 
acid sequence <SEQ ID 1526>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4205 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1527> which encodes the amino acid 

sequence <SEQ ID 1528>. Analysis of this protein sequence reveals the following: 

25 Possible site: 32 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 21/63 (33%) , Positives = 31/63 (48%) , Gaps = 1/63 (1%) 

35 

Query: 1 MQQFNLKQLREKKGFTQNELADKANVSRSLWGLETGSYSETSTASLKKLAKALDVKIKD 60 

M+ LK R K +Q LAD VSR + +E G Y+ T + + + LD + D 
Sbjct: 1 MKNLKLKAARAGKDLSQQALADLVGVSRQTIAAVEKGDYNPTINLCI-AICRVLDKTLDD 59 

40 Query: 61 LFF 63 

LF+ 

Sbjct: 60 LFW 62 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 478 

A DNA sequence (GBSx0515) was identified in S.agalactiae <SEQ ID 1529> which encodes the amino 
acid sequence <SEQ ID 1530>. Analysis of this protein sequence reveals the following: 

possible site: 26 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 03 96 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA17582 GB:D90907 hypothetical protein [Synechocystis sp.] 
Identities = 45/164 (27%) , Positives = 79/164 (47%) , Gaps = 33/164 (20%) 

Query: 102 EEEELRNLFTKLIASSMDKSKNEFNHPSFIEIIKQFDKIDAQNFKIISDLYFKKGFVATG 161 

++E L+ L+ L+AS++ +S + SF+E++KQ D +DA+ ++ L+ + 
Sbjct: 97 DDENLQTLWAMLIASALTESDRTMSTKSFVEVLKQVDIVIffiELIJIVLYLLHLRV 150 

Query: 162 TYYTTIIGQDKPLEHIASHVFVDNIjEQNDIAIQSSSLTNLERLGLIQINY--KAHvDEKE 219 

KP E ++ D+ + N + I S +L NLERLGL+ 1+ VDE+ 
Sbjct: 151 MAKPDEFTYAN DSRKYNIVQI-SVALNNLERLGLLIIHKYDDTPVDEEA 198 

Query: 220 YYNILNNSFITKKNSELKEQNKRVLTNLGMITLTLFGVRFSKTC 263 

+1 ++ N K ++LTLFG+ F + C 

Sbjct: 199 RISIW YMQDGNRSFKAH VSLTLFGIHFMRVC 229 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1531> which encodes the amino acid 
sequence <SEQ ID 1532>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0151 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/215 (29%) , Positives = 105/215 (48%) , Gaps = 23/215 (10%) 

Query: 65 QKIiAKEIQDWSKNIE-NLQEPSLSIAGPALEASKFYLEEEELRNLFTKLIASSMDKSKN 123 

+K EI SK + +L+EP I PA+ S+ YL E LRN+F + IAS+ ++ K 
Sbjct: 72 EKFKNEIDCEFSKIPQTSLKEPVEYILYPAINESEQYLSNETLRNMFARTIASTFNQDKE 131 

Query: 124 EFNHPSFIEI IKQFDKIDAQNFKI ISDLYFKKGFVATGTYYTTI IGQDKPLEHI 177 

+ H +F++IIKQ +DAQN +1+ IG E++ 

Sbjct: 132 KDLHSAFVQI IKQMTPLDAQNLLLINQ EGNNLIANLQIGVHYSKENLSGTVNK 184 

Query: 178 ASHVFVDNLEQNDIAIQSSSLTNLERLGLIQINYKAHVDEKEYYNILNNSFITKKNSELK 237 

A+++++ L+ + I +SS+ NL RLGLI+++Y + + Y +1 + SE+ 
Sbjct: 185 ANNIYLSKLDYSPDII-ASSIDNLTRLGLIKVDYLHYPLDSNYESIKQTTIYKSLESEIN 243 

Query: 238 EQNKRVLTNL GMITLTLFGVRFSKTCL 264 

N +N G ++LT FG +F CL 

Sbjct: 244 TLNLFKTSNTKYDIKIEKGKVSLTDFGKKFISVCL 278 

SEQ ID 1530 (GBS261) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 8; MW 31kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 479 

A DNA sequence (GBSx0516) was identified in S.agalactiae <SEQ ID 1533> which encodes the amino 
acid sequence <SEQ ID 1534>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.55 Transmembrane 3 - 19 ( l - 26) 
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Final Results 

bacterial membrane --- Certainty=0 .4418 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 480 

A DNA sequence (GBSx0517) was identified in S.agalactiae <SEQ ID 1535> which encodes the amino 
acid sequence <SEQ ID 1536>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
15 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 35 - 51 ( 30 - 51) 

Final Results 

bacterial membrane Certainty=0 . 2996 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1537> which encodes the amino acid 
25 sequence <SEQ ID 1538>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.94 Transmembrane 31 - 47 ( 30 - 51) 

30 Final Results 

bacterial membrane Certainty=0. 2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below: 

Identities = 45/52 (86%) , Positives = 48/52 (91%) 

Query: 1 MNWKKLMLGDLEHTFTSRDGKEKTSVEFEGGVLPALLvIjGGITWLIAWLITK 52 
MNWKKLM GDLEHTFT+ DGKEKTS+EFEGGVLPALLVLGGI W+IAW ITK 
40 Sbjct: 1 MNWKKLMFGDLEHTFTNHDGKEKTSIEFEGGVIPALLVLGGIAWMIAWFITK 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 481 

45 A DNA sequence (GBSx0518) was identified in S.agalactiae <SEQ ID 1539> which encodes the amino 
acid sequence <SEQ ID 1540>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3445 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 482 

A DNA sequence (GBSx0519) was identified in S.agalactiae <SEQ ID 1541> which encodes the amino 
10 acid sequence <SEQ ID 1542>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 3934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 483 

A DNA sequence (GBSx0520) was identified in S.agalactiae <SEQ ID 1543> which encodes the amino 
25 acid sequence <SEQ ID 1544>. This protein is predicted to be repressor protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 0905 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 9643> which encodes amino acid sequence <SEQ ID 9644> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1545> which encodes the amino acid 
sequence <SEQ ID 1546>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3117 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 175/264 (66%) , Positives = 207/264 (78%) , Gaps = 19/264 (7%) 
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Query: 



34 



LGKyiKKyRDTNNLSMAEFAKESGISKAY--VSIIiEKNRDPRNGKEIIPSIPIIKKVSDT 91 
LG I+K R+ N++ E ++ G+ K Y VS EKN + GK++ KK+++ 
LGDRIRKLREGRNMTQTELSE I LGM - KTYTTVSKWEKNENFPKGKDIi KKLAEI 75 



SbjGt: 



24 



Query: 



92 



IGISFDDLLNSLDENQIVALNETKTEKWLTSSTLQKITSTSSQDEQPRQEKVLSFANEQL 151 

++ D LL L ++K K + +1 S +QLEQPRQEKVL+FANEQL 

FNVTSDYLLG LTDSKLGKITIQNEQPEIVSIYNQLEQPRQEKVLNFANEQL 126 



Sbjct: 



76 



Query: 152 EEQNKOTSMFDRKVEETENYITDYVEGLVAAGLGAYQEDNLHMEVKLRADDVPDKYDTIA 211 

EEQNK VS+FD+K EETE+YITDYVEGLVAAGLGAYQEDNLHM+VKLR+DDVPD+YDTIA 
Sbjct: 127 EEQNKTVSIFDKKSEETEDYITDYVEGLVAAGLGAYQEDNLHMKVKLRSDDVPDEYDTIA 186 

Query: 212 KVAGNSMEPLIQDNDLLFVKVSSQVDMNDIGIFQVNGKNFVKKLKRDYDGAWYLQSLNKS 271 

KA/AG+SMEPLIQDM)LLF+KVSSQVDMlSnDIGIFQWGKNFVKKLKRDYDGAWYLQSLNKS 
Sbjct: 187 KVAGDSMEPLIQDITOLLFIKVSSQVDMNDIGIFQWGKMFVKKLKRDYDGAWYLQSIjNKS 246 

Query: 272 YEEIYLSENDNIRTIGEWDIYRE 295 

YEEIYLS++D+IRTIGEWDIYRE 
Sbjct: 247 YEEIYLSKDDDIRTIGEWDIYRE 270 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 484 

A DNA sequence (GBSx0521) was identified in S.agalactiae <SEQ ID 1547> which encodes the amino 
acid sequence <SEQ ID 1548>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 485 

A DNA sequence (GBSx0522) was identified in S.agalactiae <SEQ ID 1549> which encodes the amino 
acid sequence <SEQ ID 1550>. This protein is predicted to be integrase (ripX). Analysis of this protein 
sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 . 37S0 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0 .2719 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



>GP:CAB96616 GB:AJ400629 integrase [Streptococcus pneumoniae 
bacteriophage MM1] 
Identities = 36/59 (61%) , Positives = 48/59 (81%) , Gaps = 1/59 (1%) 



10 



15 



35 



40 



50 
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Query: 2 KIYGDYHTHLFRHSHISFLAEKGIPLNAIMDRV6HSDPKTTLSIYSHTTVNMKE-IINK 59 

KI + +H+FRHSHISFLAE G+P+ +IMDRVGHS+ K TL IYSHTT +M++ ++NK 
Sbjct: 312 KIEKNLSSHIFRHSHISFLAESGLPIKSIMDRVGHSNAKMTLEIYSHTTEDMEDKLVNK 370 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1551> which encodes the amino acid 
sequence <SEQ ID 1552>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2719 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/71 (88%) , Positives = 66/71 (92%) 

Query: 1 MKIYGDYHTHLFRHSHISFIiAEKGIPLNAIMDRVGHSDPKTTLSIYSHTTVNMKEIINKQ 60 
20 +KIYGDYHTHLFRHSHISFLAEKGIPIJ^AIMDRVGHSDPKTTLSIYSHTTVNMKEIINKQ 

Sbjct: 1 LKIYGDYHTHLFRHSHISFLAEKGIPLNAIMDRVGHSDPKTTLSIYSHTTVNMKEIINKQ 60 

Query: 61 TAPFVPLLKSE 71 
T PF +K + 
25 Sbjct: 61 TDPFKTGIKQK 71 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 486 

30 A DNA sequence (GBSx0523) was identified in S.agalactiae <SEQ ID 1553> which encodes the amino 
acid sequence <SEQ ID 1554>. This protein is predicted to be 50S ribosomal protein L19 (rplS). Analysis 
of this protein sequence reveals the following: 



Possible site: 54 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3331 (Affirmative) < suco 

bacterial membrane ~-r Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9641> which encodes amino acid sequence <SEQ ID 9642> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC01534 GB:U88973 ribosomal protein L19 [Streptococcus thermophilus] 
45 Identities = 110/115 (95%) , Positives = 112/115 (96%) 

Query: 25 IWPLIQSLTEGQLRSDIPEFRAGDTVRVHAKVVEGTRERIQIFEGWISRKGQGISEMYT 84 

MNPLIQSLTEGQLR+DIP FR GDTVRVHAKWEGTRERI QI FEGWI SRKGQGI SEMYT 
Sbjct: 1 raPLIQSLTEGQLRTDIPSFRPGDTVRVHAKVVEGTRERIQIFEGVVISRKGQGISEMYT 60 



Query: 85 vRKISGGIGVERTFPIHTPRvDKIEvWYGKVRRAKLYYLRALQGKAARIKEIRR 139 

vrkis gigvertfpihtprvdkiewrygk™raklyylralqgkaarikeir+ 
sbjct: 61 vrkissgigvertfpihtprvdkiewrygkvrraklyylralqgkaarikeirk 115 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1555> which encodes the amino acid 
sequence <SEQ ID 1556>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4849 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/115 (96%), Positives = 113/115 (97%) 

Query: 25 MNPLIQSLTEGQLRSDI PEFRAGDTVRVHAKVVEGTRERI Q I FEGWT SRKGQGI SEMYT 84 

MNPLIQSLTEGQLRSDIP FR GDTVRVHAKWEGTRERIQIFEGWISRKGQGISEMYT 
Sbjct: 1 MNPIiIQSLTEGQLRSDIPNFRPGDTWVmKVVEGTRERIQIFEGVVISRKGQGISEMYT 60 

Query: 85 VRKISGGIGVERTFPIHTPRVDKIEWRYGKVRRAKLYYLRALQGKAARIKEIRR 139 

WKISGGIGVERTFPIHTPRVDKIEV+R+GKVRRAKLYYLRALQGKAARIKEIRR 
Sbjct: 61 TOKISGGIGVERTFPIHTPRVIIKIEVIRHGKVRRAKLYYLRALQGKAARIKEIRR 115 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 487 

A DNA sequence (GBSx0524) was identified in S.agalactiae <SEQ ID 1557> which encodes the amino 
acid sequence <SEQ ID 1558>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 58 

>>> s Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC18596 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 
Identities = 111/129 (86%) , Positives = 117/129 (90%) 

Query: 1 MKAQAIVTSQGRIVSLDIAVNYCHDMKLFKMSRRNIGQAAKIIiADSGYQGIMKMYSQAQT 60 

MK QAI VTSQGRI VSLD I VNYCHDMKLFKMSRRNIGQA KILADSGYQG+MK+Y QAQT 
Sbjct: 1 MKTQAIVTSQGRIVSLDITVNYCHDMKLFKMSRRNIGQAGKILADSGYQGLMKIYPQAQT 60 

Query: 61 PRKSSKLKPLTLEDKTYNHTLSKERIKVENIFAKVKTFKIFSTTYRNRRKRFGLRMNLIA 120 

RKSSKLKPLT+EDK NH LSKER KVENI FAKVKTFK+ FSTTYR+ RKRFGLRMNL A 
Sbjct: 61 SRKSSKLKPLTvEDKACNHALSKERSKVENIFAKVKTFKMFSTTYRSHRKRFGLRMNLSA 120 

Query: 121 GMINRELGF 129 

G+IN ELGF 
Sbjct: 121 GIINHELGF 129 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 488 

A DNA sequence (GBSx0526) was identified in S.agalactiae <SEQ ID 1559> which encodes the amino 
acid sequence <SEQ ID 1560>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.99 Transmembrane 81 - 97 ( 67 - 107) 
INTEGRAL Likelihood = -6.32 Transmembrane 8 - 24 ( 6 - 25) 
INTEGRAL Likelihood = -2.76 Transmembrane 120 - 136 ( 120 - 136) 



Final Results 

bacterial membrane -- 

bacterial outside -• 

bacterial cytoplasm -- 



- Certainty=0. 53 94 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < succ> 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 53/150 (35%) , Positives = 82/150 (54%) , Gaps = 1/150 (0%) 

Query: 1 MLNPYKRIFTLGLLATFLLFIFHFGRYSGLGTNLIEASFTNKNLYDYDWLLKLCLTVITL 60 

M N R F GL+ L +1 Y+G G +++E SFT +++ Y +L KL T +T+ 

Sbjct: 251 MKNHTVRAFVGGLIIVALTYIIGSYDYNGRGLDMLEDSFT-QDVPPYAFLAKLVFTAVTM 309 

Query: 61 AAGYQGGEVTPLFAIGASLGVIIAPILGLPVILVAALGYTSVFGSATNTLLGPILIGGEV 120 

G+ GGE PLF +GA+LG + + LP+ +AALG FG NT + L+G E+ 
Sbjct: 310 GMGFVGGEAIPLFFVGATLGNTLHAFIDLPLSFLAALGMIVTFGGGANTPIAAFLLGVEM 369 

Query: 121 FGFANTPYFVIVCLVAYS ISHAHTIYGAQS 150 

F +F + CL +Y S H ++ +Q+ 

Sbjct: 370 FNGKGIEFFFVACLTSYLFSGHHGLWPSQT 399 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1561> which encodes the amino acid 

sequence <SEQ ID 1562>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 
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bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■- Certainty=0. 5798 (Affirmative) < suco 
•- Certainty=0. 0000 (Not Clear) < suco 
•- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 129/397 (32%) , Positives = 210/397 (52%) , Gaps = 14/397 (3%) 

Query: 20 VLGLVGLALPIGGAVGVVDVIFGKGLLFLSEYRDHHIjFLLLPFLALAGLVIVFLYDKLG- 78 

+L + + IG VG + L E R++ + +L FL LAGL + +LY K G 

Sbjct: 9 LLTWIFFGIMIGAIVGSATALLLTVNDHLGETRENRPWEVL-FLPLAGLALGYLYMKAGT 67 

Query: 79 KEVRQGMGLVFQVGHGQKNQIPPMLIPLILFSTWVTHLFGASAGREGVAVQIGATIS 135 

E+ +G LV + G K ++ L PL+ T++T LFG S GREG A+Q+G +++ 
Sbjct: 68 SAGNELYKGNNLVIESVQG-KGKMLLRLGPLVYLGTFMTILFGGSTGREGAAIQMGGSVA 126 
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10 



Query: 


136 


Sbjct: 


127 


Query: 


195 


Sb j ct : 


187 


Query: 


253 


Sbjct: 


247 


Query: 


311 


Sbjct: 


304 


Query: 


371 


Sbjct: 


364 



+ F R LL+ G++AGF F TPI A +F +E+ +G L++ AL+P LVA 



++V +T+ +E ++ ++ LT K+I L ++F LV + L G K 



+ N R AF+G L+ + L +IG Y+G G +++ +F+ Q + Y +L K++ 



15 T +++ GF GGE PLF +GA+LG L ++ LP+ +AALG FG NT A 



+G+E+F + +FV +Y+ S H ++ Q + 

20 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/147 (61%) , Positives = 111/147 (74%) 

25 Query: 3 NPYKRIFTLGLLATFLLFIFHFGRYSGLGTNLIEASFTNKNLYDYDWLLKLCLTVITLAA 62 

NPY RI +G L + L I H GRYSGLGTNL I A+F+ + + YDWLLK+ +TVI+L+A 
Sbjct: 259 NPYFRIAFIGALLSICLMIGHVGRYSGLGTNLIAAAFSGQTILTYDWLLKMIVTVISLSA 318 

Query: 63 GYQGGEvTPLFAIGASLGVIIAPILGLPVILVAALGYTSVFGSATNTLLGPILIGGEVFG 122 
30 G+QGGEVTPLFAIGASLG+++AP LGLPV+LVAALGYT+VFGSATNT PI IG EVFG 

Sbjct: 319 GFQGGEVTPLFAIGASLGIVLAPYLGLPVLLVAALGYTTOFGSATOTFWAPIFIGIEVFG 378 

Query: 123 FANTPYFVI VCLVAYS I SHAHT I YGAQ 149 
N + + AY +SH H+IY Q 
35 Sbjct: 379 PENALAYFVTSAAAYMVSHRHSIYSYQ 405 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 489 

40 A DNA sequence (GBSx0527) was identified in S.agalactiae <SEQ ID 1563> which encodes the amino 
acid sequence <SEQ ID 1564>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have a cleavable N-terra signal seq. 

INTEGRAL Likelihood = -8.65 Transmembrane 47 - 63 ( 45 - 70) 

45 INTEGRAL Likelihood = -5.04 Transmembrane 219 - 235 ( 208 - 237) 

INTEGRAL Likelihood = -3.35 Transmembrane 168 - 184 ( 168 - 187) 

INTEGRAL Likelihood = -0.48 Transmembrane 141 - 157 ( 141 - 157) 

Final Results 

50 bacterial membrane Certainty=0. 4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9317> which encodes amino acid sequence <SEQ ID 9318> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 75/223 (33%) , Positives = 119/223 (52%) , Gaps = 18/223 (8%) 
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Query: 17 FSLLIGGWGAITAVFGRVLLFLTAFRSDYIAYLLPFLSIVGLFIVFVYQKFGGKS 72 

F ++IG +VG+ TA+ V L R + ++L FL + GL + ++Y K G + 
Sbjct: 15 FGIMIGAIVGSATALLLTVNDHLGETRENRPWFVL-FLPLAGLALGYLYMKAGTSAGNEL 73 

Query: 73 VKGMGLVFEVGHGNEETIPKRLVPLVILTTWLTHLFGGSAGREGVAVQIGATVSHYFQKY 132 

KG LV E G + + RL PLV L T++T LFGGS GREG A+Q+G +V+ K 
Sbjct: 74 YKGNNLVIESVQGKGKML-LRLGPLVYLGTFMTILFGGSTGREGAAIQMGGSVAEAVNKL 132 

Query: 133 CRLQNASQLFLVM-GMAAGFAGLFQTPLAATFFAIEVLVVGRLMVSYVLPSLIAALTANF 191 

+++ L+M G++AGF F TP+ A F +E+ +GRL ++P L+A+ ++ 

Sbjct: 133 FKVKLIDTRILLMGGISAGFGAAFGTPITAAIFGMEMASLGRLKFEALVPCLVASFVGHY 192 

Query: 192 VSHSLGLEKFSH SIATSMALTPDIILKLLVLGLCFGL 228 

+ EKF H IAT ++ K+++L + F L 

Sbjct: 193 TT EKFWHVEHEKFI IATVPEVSALTFSKVILLAIVFSL 230 

There is also homology to SEQ ID 1562. 

A related GBS gene <SEQ ID 8577> and protein <SEQ ID 8578> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 9.66 
GvH: Signal Score (-7.5): -1.12 

Possible site: 27 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 7 value: -10.99 threshold: 0.0 
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modified ALOM score: 2.70 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 5394 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01989{349 - 1491 of 1794) 

GP| 4512350 1 dbj |BAA75315.l| |AB011836 (15 - 399 of 424) similar to Bordetella paraperlussis 
transposase for insertion sequence element (27%-identity) {Bacillus halodurans} 
PIR|T44296]T44296 hypothetical protein [imported] - Bacillus halodurans 
%Match =15.4 

%Identity = 33.4 %Similarity =54.7 

Matches = 129 Mismatches = 167 Conservative Sub.s = 82 



222 252 282 312 342 372 402 432 

MY*RKSKTINLTM*YEQLSKTL*QNLVFIKRRIL*TVIKRFDNYAQYVLVLGFSLLIGGVVGAITAVFGRVLLFLTAFRS 

I ::|| :||: lh= I I I 
MNKTFWLTLLTWIFFGIMIGAIVGSATALLLTVNDHLGETRE 
10 20 30 40 



462 492 513 540 570 600 630 660 

DYIAYLLPFLSIVGLFIVFVYQKFG GKSV-KGMGLVFEVGHGNEETIPKRLVPLVILTTWLTHLFGGSAGREGVAVQ 

: ::| II = II = = = l I I I = I I I I I = I = = I I III I l = = l lllll llll | = | 
NRPWFVL-FLPLAGIALGYLYMKAGTSAGNELYKGNNLVIESVQG-KGKMLLRLGPLVYLGTFMTILFGGSTGREGAAIQ 
60 70 80 90 100 110 120 
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690 720 747 777 807 837 867 894 

IGATVSHYFQKYCRLQNASQLFLVMG-MAAGFAGLFQTPLAATFFAIEVLWGRLMVSYVLPSLIAALTANFVSHSL-GL 
:| ;|: | ::: |:|| ::||| | ||: | | :|: ;||| ::| |:|:: :: : : : 

MGGSVAFAVNKLFKVKIIDTRILLMGGISAGFGAAFGTPITAAIFGMEMASLGRLKFEALVPCLVASFVGHYTTEKFWHV 
5 130 140 150 160 170 180 190 200 

924 954 984 1014 1041 1071 1101 1131 

EKFSHSIATSMALTPDIILKLLVLGLCFGLCGNLFAYLLAKA-KLIASSRLLNPYKRIFTLGLLATFLLFIFHFGRYSGL 

I III - |::s| : I I h I II = I I I 11= I =1 1=1 

10 EHEKFIIATVPEVSALTFSIWILIiAIVFSLVSVLYCQLRHGIHKLS 

210 220 230 240 250 260 270 280 

1161 1191 1221 1251 1281 1311 1341 1371 

GTNLIEASFTNKNLYDYDWLLKLCLWITLAAGYQGGEVTPLFAIGASLGVIIAPILGLPVILVAALGYTSVFGSATNTL 
15 | :::| HI ::: | :| || :| :|: |: ||| ||| :||:|| : : ||= ==1111 II II 

GLDMLEDSFT-QDVPPYAFIAKLVFTAVTMGMGFVGGEAIPLFFVG^^ 

290 300 310 320 330 340 350 

1401 1431 1461 1491 1521 1551 1581 1611 

20 LGPILIGGEVFGFANTPYFVIVCIjVAYSISHAHTIYGAQSR*LVMSFKRVYQFVERNIPFSFLFS*SL*KWSLSIL*MQK 
: |:| (:| :| > || :| I I = = :|: 

IAAFLLGVEMFNGKGIEFFFVACLTSYLFSGHHGLWPSQTIYEPKSRLYGVRKGETIKRTEEMKE 
370 380 390 400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 490 

A DNA sequence (GBSx0528) was identified in S.agalactiae <SEQ ID 1565> which encodes the amino 
acid sequence <SEQ ID 1566>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
30 >>> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3568 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB98234 GB:U67480 chorismate mutase/prephenate dehydratase 
(pheA) [Methanococcus jannaschii] 
40 Identities = 26/85 (30%) , Positives = 46/85 (53%) , Gaps = 1/85 (1%) 

Query: 2 ELEEIRQEIDEIDQQLVSLLETRMGLILEVIAFKKKHRLPVLDNNRE^VljqNVLKKVQN 61 

+L EIR++IDEID +++ L+ R L +V K + +P+ D RE + + + K + 
Sbjct: 4 KLAEIRKKIDEIDNKILKLIAERNSLAKDVAEIKNQLGIPINDPEREKYIYDRIRKLCKE 63 

45 

Query: 62 HQFDDVIRATFKDIMTE-SRVYQKE 85 

H D+ I 1+ E ++ QK+ 

Sbjct: 64 HNVDENIGIKIFQILIEHNKALQKQ 88 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1567> which encodes the amino acid 
sequence <SEQ ID 1568>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 2356 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 45/91 (49%) , Positives = 62/91 (67%) 

Query: 1 MELEEIRQEIDEIDQQLVSLLETRMGLILEVIAFKKKHRLPVLDiraREIffiVIJSIlIVLKKVQ 60 

M LE+IRQEI+ ID LV+LLE RM L+ +V A+K + LPVLD REN++L+ V V+ 
Sbjct: 1 MRLEKIRQEINGIDHHLVALLEKRMALVEQVTAYKLANHLPVLDQARENQILDRVSYLVK 60 

Query: 61 NHQFDDVIRATFKDIMTESRVYQKENIVDGD 91 

+ F+ I TFK IM+ SR YQ +++ GD 
Sbjct: 61 DQAFEPAIHETFtCTIMSLSRQYQTQHLTGGD 91 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 491 

A DNA sequence (GBSx0529) was identified in S.agalactiae <SEQ ID 1569> which encodes the amino 
15 acid sequence <SEQ ID 1570>. This protein is predicted to be neuraminidase. Analysis of this protein 
sequence reveals the following: 



10 



20 



25 



35 



40 



50 



55 



Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.35 Transmembrane 28 - 44 ( 28 - 47) 



Final Results 

bacterial membrane Certainty=0 . 2338 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10191> which encodes amino acid sequence <SEQ ID 
101 92> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA51473 GB:X72967 neuraminidase [Streptococcus pneumoniae] 
30 Identities = 294/504 (58%) , Positives = 380/504 (75%) , Gaps = 10/504 (1%) 



Query: 


303 


Sb j ct : 


299 


Query: 


363 


Sb j ct : 


359 


Query: 


423 


Sb j ct : 


415 


Query: 


483 


Sb j ct : 


472 


Query: 


543 


Sb j ct : 


532 


Query: 


603 


Sb j ct : 


592 


Query: 


663 


Sbjct: 


651 



E+++ Q + + + KLP+GA L+ KT+++ G G+ NKD + YRIP+LLKT+KGT 



L+ GADER + DWG+ IGMVIRRSED+G TWG R TI NLR+NP+ S GSP+ 

LIAGADERRLHSSDWGDIGMVIRRSEDNGKTWGDRVTITNLRDNPKA SDPSIGSPV 414 



N+DM LVQD +TKRI FS I YDMFPEG+G+ +++ E+ Y +1 G++Y LY G+K 



45 +TIR+ G VY GK TDY V+ + K +S+ GD+YKG QLLGNIYFT +KTSPFR+A 



K SY+WMSYSDDDG+TWS+P+DIT ++ MKFLG+GPG GIVL+ GPH GRI+IP Y+ 



TN SHL GSQSSR+IYSDDHGKTWH G+AVNDNR + +G+KIHS TM+N++ QNTES 



VQL NGD+KLFMR LTG+L+VATSKDGG TW+ +KRY +V D YVQ+SAI H+ KEY 
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Query: 723 ILLTOANGPGKKRQDGYAR^QvNRNGSFKiaYHHHIQDGSFA™^ 782 

I+L NA GP KR++G LA+V NG WL H+ IQ G FAYNS+Q+L N ++G+LYE 
Sbjct: 711 IILSNAGGP--KEENGMVHLARVEENGELTWLKHNPIQKGEFAYNSLQELGNGEYGILYE 768 

Query: 783 HREKHQNSFTLNYKVFNWSFLSQN 806 

H EK QN++TL+++ FNW FLS++ 
Sbjct: 769 HTEKGQNAYTLSFRKFNWDFLSKD 792 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 492 

A DNA sequence (GBSx0530) was identified in S.agalactiae <SEQ ID 1571> which encodes the amino 
acid sequence <SEQ ID 1572>. This protein is predicted to be unnamed protein product (gatC). Analysis of 
this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




12. 


.63 


Transmembrane 


154 


- 170 


( 


149 


- 178) 


INTEGRAL 


Likelihood 




•11. 


.99 


Transmembrane 


103 


- 119 


( 


98 


- 123) 


INTEGRAL 


Likelihood 




-7 


.91 


Transmembrane 


21 


- 37 


( 


14 


- 40) 


INTEGRAL 


Likelihood 




-6. 


.53 


Transmembrane 


448 


- 464 


( 


444 


- 467) 


INTEGRAL 


Likelihood 




-5 


.89 


Transmembrane 


47 


- 63 


( 


45 


- 68) 


INTEGRAL 


Likelihood 




-5 


.10 


Transmembrane 


356 


- 372 


( 


352 


- 373) 


INTEGRAL 


Likelihood 




-4. 


.78 


Transmembrane 


330 


- 346 


( 


328 


- 350) 


INTEGRAL 


Likelihood 




-4 


.41 


Transmembrane 


376 


- 392 


( 


375 


- 393) 


INTEGRAL 


Likelihood 




-3 


.72 


Transmembrane 


243 


- 259 


( 


235 


- 266) 


INTEGRAL 


Likelihood 




-2 


.55 


Transmembrane 


277 


- 293 


( 


275 


- 293) 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1573> which encodes the amino acid 
sequence <SEQ ID 1574>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




12. 


.31 


Transmembrane 


154 - 


170 


( 


150 


- 179) 


INTEGRAL 


Likelihood 




•11. 


.68 


Transmembrane 


104 - 


120 


( 


99 


- 124) 


INTEGRAL 


Likelihood 




-9. 


.82 


Transmembrane 


447 - 


463 


( 


442 


- 469) 


INTEGRAL 


Likelihood 




-7. 


.91 


Transmembrane 


22 - 


38 


( 


11 


- 41) 


INTEGRAL 


Likelihood 




-7. 


.11 


Transmembrane 


377 - 


393 


( 


375 


- 403) 


INTEGRAL 


Likelihood 




-5. 


,89 


Transmembrane 


48 - 


64 


( 


46 


- 69) 


INTEGRAL 


Likelihood 




-4. 


.78 


Transmembrane 


331 - 


347 


( 


329 


- 351) 


INTEGRAL 


Likelihood 




-3 


.88 


Transmembrane 


357 - 


373 


( 


353 


- 373) 


INTEGRAL 


Likelihood 




-2 


.55 


Transmembrane 


278 - 


294 


( 


276 


- 294) 


INTEGRAL 


Likelihood 




-1. 


.22 


Transmembrane 


240 - 


256 


< 


240 


- 257) 



Final 



Results 

bacterial membrane Certainty=0 . 6052 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5925 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 419/482 (86%) , Positives = 447/482 (91%) 



Query: 1 MQVFIjNIWKFFDPIIHMGSGVVMLIVMTGIiAMIFGVKFSKALEGGIKLAIALTGIGAII 60 
MQ FL+I+NK I +GSGVVMLIVMTGIAMIFGVKF+KALEGGIKLAIALTGIGAII 
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Sbjct: 2 MQPFLDIINKILGFPIQLGSGWMLIWTGIjMIFGVKFTKALEGGIKLAIALTGIGAII 61 

Query: 61 GILTGAFSESLQAFVKOTGINLSIIDVGWAPLATITWGSPYTLYFLLIMLIWIVMIVMK 120 

GILTGAFSESLQAFVKNTGI +L+ 1 1DVGWAPLATITWGSPYTLYFLL+ML+VNIVMIVMK 
Sbjct: 62 GILTGAFSESLQAWKNTGISLNIIDVGWAPLATITWGSPYTLYFLLVMLVVNIVMIVMK 121 

Query: 121 KTDTLDVDI FD1WHLS I TGLLIMWYAKKNNLPTLLSVI IATVAI I FVGVLKI INSDLMKP 180 

KTDTLDVDIFDIWHLSITGLLIMWYA +N+LP +S++IATVA+I VGVLKI INSDLMKP 
Sbjct: 122 KTDTLDVDI FDIWHLS ITGLLIMireAARNHLPVFVSLLIATVAVILVGVLKI INSDLMKP 181 

Query: 181 TFDDLLGTGPTSPMTSTHMNYMMNPIIMVLDKLFDKVFPGLDKYDFDAAKLNKA.IGFWGS 240 

TFDDLLGTGP SPMTSTHMNYMMNPIIMVLDK+FDKVFPGLDKYDFDAAKLNK IGFWGS 
Sbjct: 182 TFDDLLGTGPQSPMTSTHMNYMMNPIIMVLDKIFDKVFPGLDKYDFDAAKLNKKIGFWGS 241 

Query: 241 KFFIGMILGLVIGIMGNPVFSFAALGGWFSLGFTAGACLELFSLIGSWFIAAVEPLSQGI 300 

KFFIGM LG VIGIMG+P F+ ++ WF LGFTAGACLELFSLIGSWFIAAVEPLSQGI 
Sbjct: 242 KFFIGMALGFVIGIMGDPHFTVESIKNWFGLGFTAGACLELFSLIGSWFIAAVEPLSQGI 301 

Query: 301 TNFANGKMHGRRFNIGLDWPFIAGRAEIWACANIIAPIMLVEAILLSKVGNGILPLAGII 360 

TNFAN +MHGRRFNIGLDWPFIAGRAEIWACANILAPIML+EA+LLSKVGNGILPLAGII 
Sbjct: 302 TNFANARMHGRRFNIGLDWPFIAGRAEIWACANILAPIMLIEAVLLSKVGNGILPLAGII 361 

Query: 361 AMGVTPALLWTRGRLIRMITFGTLLLPLFLLSGTMIAPFATELAKKVGAFPAGARAGSL 420 

AMG+TPALLWTRGRLIRMI FG+LLLPLFLLSGTMIAPFATELAKKVGAFPAG AGSL 
Sbjct: 362 AMGMTPALLWTRGRLIRMI I FGSLLLPLFLLSGTMIAPFATELAKKVGAFPAGTSAGSL 421 

Query: 421 ITHSTLEGPMEKIFGYVIGKATTGQLSAI ITLI I FATAYLGLFMWYAKQMKRRNAEYAAN 480 

ITHSTLEGPMEKIFGYVIG+ATTGQ+++IITLIIF YL LF WYA QMK RNAEYA 
Sbjct: 422 ITHSTLEGPMEKIFGYVIGQATTGQIASIITLIIFVAIYLSLFAWYANQMKARNAEYAKT 481 

Query: 481 QK 482 
K 

Sbjct: 482 MK 483 

A related GBS gene <SEQ ID 8579> and protein <SEQ ID 8580> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 4.31 
GvH: Signal Score (-7.5): -2.64 

Possible site: 34 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 6 value: -12.63 threshold: 0.0 



INTEGRAL 


Likelihood 




•12. 


,63 


Transmembrane 


154 


- 170 


( 


149 - 


178) 


INTEGRAL 


Likelihood 




•11. 


,99 


Transmembrane 


103 


- 119 


( 


98 - 


123) 


INTEGRAL 


Likelihood 




-7. 


,91 


Transmembrane 


21 


- 37 


( 


14 - 


40) 


INTEGRAL 


Likelihood 




-5. 


.89 


Transmembrane 


47 


- 63 


( 


45 - 


68) 


INTEGRAL 


Likelihood 




-4. 


.88 


Transmembrane 


243 


- 259 


( 


235 - 


265) 


INTEGRAL 


Likelihood 




-1.22 


Transmembrane 


268 


- 284 


( 


268 - 


284) 


PERIPHERAL 


Likelihood 




0 


.85 


127 













modified ALOM score: 3.03 



*** Reasoning Step: 3 



The protein has homology with the following sequences in the databases: 

ORF00838(343 - 1122 of 1455) 

EGAD|91348|EC2092(9 - 344 of 451) PTS system, galactitol specific IIC component 
{Escherichia coli} OMNI |NT01EC2494 PTS system galactitol-specif ic enzyme IIC component 
SP|P37189|PTKC_EC0LI PTS SYSTEM, GALACTITOL -SPECIFIC IIC COMPONENT (EIIC-GAT) (GALACTICOL- 
PERMEASE IIC COMPONENT) (PHOSPHOTRANSFERASE ENZYME II, C COMPONENT) . 

GP | 1736809 | dbj |BAA15955.l| |D90847 PTS system, Galactitol-specif ic IIC component (EIIC-GAT) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 6052 (Affirmative) < suco 
— Certainty=0 . 0000 (Not Clear) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 
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(Galactitol- permease IIC component} (Phosphotransferase enzyme II, C component) . 
{Escherichia coli} GP| 17884 
%Match =10.9 

%Identity =29.8 %Similarity = 59.2 
5 Matches = 68 Mismatches = 88 Conservative Sub.s = 67 

282 312 342 372 402 432 462 492 

LS*HI*NWN*S*RRRNMQVFLNIWKFFDPIIHMGSGWMLIVM!G]^^ 

|: :| |:: ||: :: ]:|:| :: |: : | : |]| :||:: 

0 MFSEVMRYILDLGPTVMLPIVIIIFSKILGMKAGDCFKAGLHIGIGFVGIGLVIGLML 

10 20 30 40 50 

522 552 582 612 642 672 702 

GAFSESLQAFVKNTGINLSIIDVGWAP^TITWGSPYTLYFLLIMLIWIVMIVMKKTDTLDVDIFDIWHLSITGLLIM- 



DSIGPAAKAMAENFDIiNLHVVDVGWPGSSPMTWASQIALVAIPIAILVNVM 

70 80 90 100 110 120 130 

747 774 804 834 864 894 



20 WYAKKN-NLPTLLSVI IATVAI I FVGVLKI INSDLMKPTFDDLLGTGPTSPMTSTH 

|:|: |: | : | ::| 

ATGSWMIGMAGWIHAAFVYKLGDWFARDTRNFFELEGIAIPHGTSAYMG 

150 160 170 180 

25 924 954 984 1014 1044 



MNYMMNPIIIWLDKLFDKVFPGLDKYDFDAAKLNKAIGFWGSKFFIGMILGLVIXIM 

II :=:] = =1= 11=== | | : : | :| :| ::||:| |: 

PIAVLVDAI IEKI -PGVNRIKFSADDIQRKFGPFGEPVTVGFVMGLI IGILAGYDVKGVLQLAVKTAAVML 

200 210 220 230 240 250 

30 

1092 1122 1152 1182 1212 1242 

— - - - -GNPVFSFASIRWLVFFFAn J QQGACLEVGLF*LVSWVQLLQ*NHFLRKLLILIWNAXX* 

II 1= I = =11 = 

-WSASLIFIPLTILIAVCVPGNQVLPFGDLATIGFEVAMAVAVHRGNLFRTLISGVIIMSITLWIATQTIGLHTQLAAN 
35 320 330 340 350 360 370 380 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 493 

A DNA sequence (GBSx0531) was identified in S.agalactiae <SEQ ID 1575> which encodes the amino 
40 acid sequence <SEQ ID 1576>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 0302 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1577> which encodes the amino acid 
sequence <SEQ ID 1578>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 0302 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 85/100 (85%) , Positives = 96/100 (96%) 

Query: 1 MIKIIAACQAGVNSSHQIKDAIETQLGDRGYNVHCDAVMVKDITEEMVNKyDIFTPIAKT 60 

MIKILAACGAGVNSSHQIKDAIETQ+ DRGY+VHCDAVMVKDITEE+V++YDIFTPIAKT 
Sbjct: 1 MIKILAACGAGWSSHQIKDAIETQMSDRGYDVHCDAVMVKDITEELVSRYDIFTPIAKT 60 

Query: 61 DLGFNVPI P WEAGPIIiYRI P VMSEPVFTALEQVIKEHNL 100 

DLGF +PIP+VEAGPILYRIP+MSEPVF LE+VIKE++L 
Sbjct: 61 DLGFEMPI PIVEAGPILYRI PIMSEPVFAELERVIKENHL 100 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 494 

A DNA sequence (GBSx0532) was identified in S.agalactiae <SEQ ID 1579> which encodes the amino 
acid sequence <SEQ ID 1580>. This protein is predicted to be GatA. Analysis of this protein sequence 
reveals the following: 

Possible site: 15 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2078 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10193> which encodes amino acid sequence <SEQ ID 
10194> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG09977 GB:AF248038 GatA [Streptococcus agalactiae] 
Identities = 39/135 (28%) , Positives = 76/135 (55%) , Gaps = 9/135 (6%) 

Query: 16 QEELFDLVSKALIKQHYVSPNYRQAVKEREREFPTGLKIDLKDGTPIQYVAIPHTETQYC 75 

Q L +++S+ L+++ YV + +A+ +RE+++PTGL+++ VAIPHT ++Y 

Sbjct: 20 QTNLLEVLSQYLLQKGYVKTEFSKAILQREKDYPTGLQLE NMAVAI PHTYSEYV 73 

Query: 76 LVDRIFYVKNSQPITFKHMINPEEECRVQDFFFIINSRN-SNQSDILSNLITFFITKGNL 134 

L 1+ K +PI+F M E+E + + ++ N +Q+ +L+ L+T F + 
Sbjct: 74 LKPFIYINKLKEPISFIQM-GTEDEI VMARYVIVLGISNPKDQAGLLAELMTLFSNPKIV 132 

Query: 135 DRLHELGDNKEKINH 149 

+L E+ KE + + 
Sbjct: 133 QQL - EMAQTKEALKN 146 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1581> which encodes the amino acid 
sequence <SEQ ID 1582>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3130 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/154 (66%) , Positives = 122/154 (78%) 
Query: 4 OTQDILFIDAHSQEELFDLVSKALIKQHYVSPNYRQAVKEREREFPTGLKIDLKDGTPIQ 63 
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V +ILF +A +Q ELFDLV+ L K YV+ Y QA+ ERE FPTGLK+DLKDG+ I 
Sbjct: 1 VFPNILFTEARTQPELFDLVASHLEKVGYVTQEYHQALVEREAVFPTGLKVDLKDGSDIL 60 

Query: 64 YVAIPHTETQYCLVDRIFYVKNSQPITFKHMINPEEECRVQDFFFIINSRNSNQSDILSN 123 
5 Y AIPHTET+YCLVD++ YV+NSQ +TFKHMINPEE+C V DFFFIINS+N Q+ ILSN 

Sbjct: 61 YAAIPHTETKYCLVDQWYVRNSQALTFKHMINPEEDCLVTDFFFIINSQNEGQTTILSN 120 

Query: 124 LITFFITKGNLDRLHELGDNKEKINHYLIEKGVF 157 
LITFFITKGNL L L D+K+ I++YLIERGVF 
10 Sbjct: 121 LITFFITKGNLSYLASLKDDKQAISWYLIEKGVF 154 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 495 

15 A DNA sequence (GBSx0533) was identified in S.agalactiae <SEQ ID 1583> which encodes the amino 
acid sequence <SEQ ID 1584>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 1429 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA25176 GB:M60447 repressor protein [Lactococcus lactis] 
Identities = 139/255 (54%) , Positives = 189/255 (73%) , Gaps = 6/255 (2%) 

Query: 1 MLKRERLQKIIEKA7NINGIVTVNEIMEELDVSDMTVRRDLDELDKAGLLIRIHGGAQKVN 60 
30 M K+ RL+KI++ + I+G +T+ EI++ELD+SDMT RRDLD L+ GLL R HGGAQ ++ 

Sbjct:. 7 MNKKRRLEKILDMLKIDGTITIKEIIDELDISDMTARRDLDALEADGLLTRTHGGAQLLS 66 

Query: 61 ASPTPQNYEKSNTEKYDIQTNEKLEIAQFAKQFINDGETIFIGPGTTLEKLATQLLD 117 

+ + EK++ EK + T EK++IA+ A I DG+TIFIGPGTTL +LA +L 
35 Sbjct: 67 SK---KPLEKTHIEKKSLNTKEKIDIAKKACSLIKDGDTIFIGPGTTLVQLALELKGRKG 123 

Query: 118 FKIRVVTNSLPVFNILNQSSTLDLILVGGEYREITGAFVGSVTINSIKSLNFSKAFVSSN 177 

+KIRV+TNSLPVF ILN S T+DL+L+GGEYREITGAFVGS+ ++K++ F+KAFV +N 
Sbjct: 124 YKIRVITNSLPVFLIIOTSETIDLLLLGGEYREITGAFVGSMASTNLKAMRFAKAFVRAN 183 

40 

Query: 178 GVFEKSIATYDEGEGEIQRIALNNSFEKFLLVDSQKFGKYDFYTFYQLDDIDFVLTDHNI 237 

V SIATY + EG IQ++ALNN+ EKFLLVDS KF +YDF+ FY LD +D ++TD+ I 
Sbjct: 184 AVTHNS lATYSDKEGVIQQIiALNNAVEKFLLVDSTKFDRYDFFNFYNLDQLDTI ITDNQI 243 

45 Query: 238 DNWKEQYSSFTKIL 252 

E++S +T IL 
Sbjct: 244 SPQHLEEFSQYTTIL 258 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1585> which encodes the amino acid 
50 sequence <SEQ ID 1586>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 0740 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 161/252 (63%) , Positives = 195/252 (76%) , Gaps = 3/252 (1%) 



Query: 


1 


MLKRERLQKIIEKOTINGIVTV1TOIMEELDVSDMTVRRDLDELDKAGLLIRIHGGAQKVN 


60 






MLKRERL KI E VN GIVTVN+I++ L+VSDMTVRRDLDEL+KAG LIRIHGGAQ + 




Sbjct: 


1 


MLKRERLLKITEIVNEQGI VTVM3IIQTLNVSDMTVRRDLDELEKAGKLIRIHGGAQSIT 


60 


Query: 


61 


ASPTPQNYEKSNTEKYDIQTNEKLEIAQFAKQFINDGETIFIGPGTTLEKLATQLLDFKI 


120 






P E+SN EK +QT EK E+A +A Q +NDGETIFIGPGTTLE A QL + +1 




Sb j ct : 


61 


M PNKKERSNIEKQTVQTKEKWELASYATQLVNDGETIFIGPGTTLECFAEQLKNRQI 


117 


Query: 


121 


RVVTNSLPVFNILNQSSTLDLILVGGEYREITGAFVGSvTINSIKSLNFSKAFVSSNGVF 


180 






R+VTNSLPVFNIL S T+DLIL+GGEYR ITGAFVGS+ +1 SL F+KAF+S NG++ 




Sb j ct : 


118 


RIVTNSLPVFNILQDSETIDLILIGGEYRSITGAFVGSLASQNISSLKFAKAFISCNGIY 


177 


Query: 


181 


EKSIATYDEGEGEIQRIALNNSFEKFLLVDSQKFGKYDFYTFYQLDDIDFVLTDHNIDNV 


240 






+ IATY E EGEIQ++A NNS EK+LLVD+QKF YDF+ FY L++ID V+TD I 




Sb j ct : 


178 


KNDIATYSETEGEIQKIAFmSIEKYLLVDNQKFNAYDFFIFYHIMMIDAVVTDSQITED 


237 


Query: 


241 


VKEQYSSFTKIL 252 








V E+YS FT++L 




Sb j ct : 


238 


VIERYSQFTQLIi 249 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 496 

A DNA sequence (GBSx0534) was identified in S.agalactiae <SEQ ID 1587> which encodes the amino 
acid sequence <SEQ ID 1588>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3436 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD13797 GB:AF062533 unknown [Streptococcus agalactiae] 
Identities = 86/371 (23%) , Positives = 136/371 (36%) , Gaps = 79/371 (21%) 



Query: 


11 


DLSESELKAAQEFLSGKSEANQDKPKTGKTAQEIYEAIEPKAIVKPEDLLFGIAQATDYK 


70 






DL++ + L K D TG IEP+ V L AT 




Sbjct: 


526 


DLTQIAFAEQELMLKDKKHYRYDIVDTG IEPRLAVDVSSLPMHAGNATYDT 


576 


Query: 


71 


NGTFVIPHKDHYHYVELKWFDEEKDLIADSDKTYSLEDYLATAKYYMMHPEKRPKVEGWG 


130 






+FVIPH DH H V W + +AT KY M HPE RP V W 




Sb j ct : 


577 




619 


Query: 


131 


KDAEIYKEKDSNKADKPSPAPTDNKSTSNSSDKNLSAAEVFKQAKPEKIVPLDKIAAHMA 


190 






K + + + P+ P D ++ + SA EV +K + + AA 




Sb j ct : 


620 


KPGH EESGSVIPNVTPLDKRAGMPNWQI IHSAEEV QKALAEGRFAA- - - 


665 


Query: 


191 


YAVGFEDDQLI VPHHDHYHNVPMAWFDKGGLWKAPEGYTLQQLFST- - I KYYMEHPNELP 


248 






DID WD +G +L+ + + + + EL 




Sb j ct : 


666 


PDGYI FDPRDVLAKETFVWKDGSFS I PRADGSSLRTINKSDLSQAEWQQAQELL 


719 


Query: 


249 


KEKGWGHDSDHNKGSNKDNKAKlJYAPDEEPEDSGKvTHNyGFYDvNKGSDEEEP-EKQED 


307 






+K G +D +K P+E+ + +K ++ ++P E ++ 




Sb j ct : 


720 


AKKNAGDATDTDK PEEKQQ ADKSNENQQPSEASKE 


754 


Query: 


308 


ESELDEYELGMAQNAKKYGMDRQSFEKQLIQLSNKYSVSFESFNYINGSQVQVTKKDGSK 


367 






E E D++ + YG+DR + E + QL+ K ++ + VQ K+G 




Sb j ct : 


755 


EKESDDF IDSLPDYGLDRATLEDHINQLAQKANID-PKYLIFQPEGVQFYNKNGEL 


809 



10 



15 



50 
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Query: 368 VLVD I KTLTE V 378 

V DIKTL ++ 
Sbjct: 810 VTYDIKTLQQI 820 

A related DNA sequence was identified in S.agalactiae <SEQ ID 6983> which encodes the amino acid 
sequence <SEQ ID 6984>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS gene <SEQ ID 8581> and protein <SEQ ID 8582> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
McG: Discrim Score: 6.06 
20 GvH: Signal Score (-7.5): -5.61 

Possible site: 26 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 0 value: 2.23 threshold: 0.0 
PERIPHERAL Likelihood =2.23 6 
25 modified ALOM score: -0.95 

*** Reasoning Step: 3 

Final Results 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1589> which encodes the amino acid 
35 sequence <SEQ ID 1590>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 808/825 (97%) , Positives = 816/825 (97%) , Gaps = 3/825 (0%) 

Query: 2 KKTYGYIGSVAAILI^THIGSYQLGKHHMGIATKDNQIAYIDDSKGKVKAPKTNKTMDQ 60 

KKTYGYIGSVAAILLATHIGSYQLGKHHMG ATKDNQIAYIDDSKGK KAPKTNKTMDQ 
Sbjct: 2 KKTYGYIGSVAAILLATHIGSYQLGKHHMGSATKDNQIAYIDDSKGKAKAPKTNKTMDQ 60 



Query: 61 ISAEEGISAEQIWKITDQGYVTSHGDHYHFYNGKVPYDAI I SEELLMTDPNYHFKQSDV 120 

ISAEEGISAEQIVvKITDCGYVTSHGDHYHFYNGKVPYDAIISEELLMTDPNY FKQSDV 
Sbjct: 61 I SAEEGI SAEQI WKITDQGYVTSHGDHYHFYNGKvPYDAI I SEELLMTDPNYRFKQSDV 120 



55 Query: 121 INEILDGWIKVNGNYYVYLKPGSKRKNIRTEQQIAEQVAKGTKEAKEKGLAQVAHLSKE 180 

INEILDGYVIKVNGNYYVYLKPGSKRKNIRTKQQIAEQVAKGTKEAKEKGLAQVAHLSKE 

Sbjct: 121 INEILDGWIKVNGNYYVYLKPGSKRKNIRTKQQIAEQVAKGTKEAKEKGLAQVAHLSKE 180 

Query: 181 EVAAVNEAKRCGRYTTDDGYIFSPTDIIDDIGDAYLVPHGNHYHYIPKKDLSPSELAAAQ 240 
60 EVAAVNEAKRQGRYTTDDGYIFSPTDIIDDLGDAYLVPHGNHYHYIPKKDLSPSELAAAQ 

Sbjct: 181 EVAA VNEAKRQGRYTTDDGYI FS PTD 1 1 DDK3DAYLVPHGNHYHYI PKKDLSPSELAAAQ 240 
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Query: 241 AYWSQKQGRGARPSDYRPTPAP--GRRKAPIPDVTPNPGQGHQPDNGGYHPAPPRPNDAS 298 

AYWSQKQGRGARPSDYRPTPAP GRRKAPI PDVTPNPGQGHQPDNGGYHPAPPRPNDAS 
Sbjct: 241 AYWSQKQGRGARPSDYRPTPAPAPGRRKAPIPDVTPNPGQGHQPDNGGYHPAPPRPNDAS 300 

5 

Query: 299 QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAFGYWPHGDHYH 358 

QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAFGYWPHGDHYH 
Sbjct: 301 QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAFGYWPHGDHYH 360 

10 Query: 359 IIPRSQLSPLEMELADRYLAGQTDDNDSGSDHSKPSDKEVTHTFLGHRIKAYGKGLDGKP 418 

IIPRSQLSPLEMELADRYLAGQT+D+DSGSDHSKPSDKEVTHTFLGHRIKAYGKGLDGKP 
Sbjct: 361 IIPRSQLSPLEMELADRYLAGQTEDDDSGSDHSKPSDKEVTHTFLGHR1KAYGKGLDGKP 420 

Query: 419 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANWVKAKGQADELV 478 
1 5 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANWKAKGQADEL 

Sbjct: 421 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANWVKAKGQADELA 480 

Query: 479 AALDQEQGKEKPLFDTKKVSRKVTKDGKVGYIMPKDGKDYFYARYQLDLTQIAFAEQELM 538 
AALDQEQGKEKPLFDTKKVSRKOTKDGKVGY+MPKDGKDYFYAR QLDLTQIAFAEQELM 
20 Sbjct: 481 AALDQEQGKEKPLFDTKKVSRKVTKDGKVGYMMPKDGKDYFYARDQLDLTQIAFAEQELM 540 

Query: 539 LKDKKHYRYDIVDTGIEPRLAVDLSSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 598 

LKDKKHYRYDIVDTGIEPRLAVD+SSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 
Sbjct: 541 LKDKKHYRYDIVDTGIEPRLAVDVSSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 600 

25 

Query: 599 NQIATI KYVMQHPEVRPDVWSKPGHEESGSVI PNVTPLDKRAGMPNWQI IHSAEEVQKAL 658 

+QIATIKYVMQHPEWPD+WSKPGHEESGSVIPNVTPLDKRAGMPNWQIIHSAEEVQKAL 
Sbjct: 601 DQIATIKYVMQHPEVRPDIWSKPGHEESGSVI PNVTPLDKRAGMPNWQI IHSAEEVQKAL 660 

30 Query: 659 AEGRFAAPDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 718 

AEGRFA PDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 
Sbjct: 661 AEGRFATPDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 720 

Query: 719 LAKKNAGDATDTDKPEEKQQADKSNENQQPSEASK-EEKESDDFIDSLPDYGLDRATLED 777 
35 LAKKNAGDATDTDKP+EKQQADKSNENQQPSEASK EEKESDDFIDSLPDYGLDRATLED 

Sbjct: 721 l^AKKl^GimTDTDKPKEKQQADKSNENQQPSEASKEEEKESDDFIDSLPDYGLDRATLED 780 

Query: 778 HINQLAQKANIDPKYLI FQPEGVQFYNKNGELVTYDI KTLQQINP 822 
HINQLAQKANIDPKYLIFQPEGVQFYNKNGELVTYDIKTLQQINP 
40 Sbjct: 781 HINQLAQKANIDPKYLI FQPEGVQFYNKNGELVTYDI KTLQQINP 825 

SEQ ID 8582 was expressed in E.coli in two different forms. GBS293dNterm was expressed in E.coli as a 
GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 147 (lane 14; MW 74kDa 
+ lanes 17 & 18; MW 48.8kDa). GBS293C was expressed in E.coli as a GST-fusion product. SDS-PAGE 
45 analysis of total cell extract is shown in Figures 148 (lane 2-4; MW 71kDa + lanes 5 & 7; MW 46kDa) and 
182 (lane 7; MW 46kDa). Purified GBS293C-His is shown in Figure 241, lanes 8& 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 497 

50 A DNA sequence (GBSx0535) was identified in S.agalactiae <SEQ ID 1591> which encodes the amino 
acid sequence <SEQ ID 1592>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have a cleavable N-term signal seq. 

55 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD13797 GB:AF062533 unknown [Streptococcus agalactiae] 
Identities = 213/463 (46%) , Positives = 277/463 (59%) , Gaps = 41/463 (8%) 

5 Query: 4 KKW-IISMjSVALFGTGVGAYQLGSYNA--QKSDNSVSYVKTDKSDSKAQATAVNKTPD 60 

KKT I +++ L T +G+YQLG ++ DN ++Y+ D S K +A NKT D 

Sbjct: 2 KKTYGYIGSVARILLATHIGSYQLGKHHMGLATKDNQIAYI--DDSKGKVKAPKTNKTMD 59 

Query: 61 QISKEEGISAEQIWKITDDGYVTSHGDHYHYYNGKVPYDAIISEELIMKDPSYVFNKAD 120 
10 QIS EEGISAEQIWKITD GYVTSHGDHYH+YNGKVPYDAI ISEEL+M DP+Y F ++D 

Sbjct: 60 QISAEEGISAEQIWKITDQGYVTSHGDHYHFYNGKVPYDAIISEELLMTDPNYHFKQSD 119 

Query: 121 VINEVraDGYIIKVNGKYYLYLKEGSKRraVRTKEQIQKQREEWSKGGSKGESGKHSSAKT 180 
VINE+ DGY+IKVNG YY+YLK GSKR N+RTK+QI +Q + +K E+ + A+ 

15 Sbjct: 120 VINEILDGYVIKVNGNYYWLKPGSKRKNIRTKQQIAEQVAKGTK EAKEKGLAQV 174 

Query: 181 QALS ASVREAKASGRYTTDDGYVFSPTDVIDDMGDAFLVPHGDHFHYIPKADLSPS 236 

LS A+V EAK GRYTTDDGY+FSPTD+IDD+GDA+LVPHG+H+HYIPK DLSPS 
Sbjct: 175 AHLSKEEVAAVNEAKRQGRYTTDDGYIFSPTDIIDDLGDAYLVPHGNHYHYIPKKDLSPS 234 

20 

Query: 237 ELSAACAYWNRKTGRSGNSS--KPSNSSSYIHASAPSGOTSTGRHANAPISIPRVTHANH 294 

EL+AAQAYW++K GR S +P+ + A P + G+ H 
Sbjct: 235 ELAAAQAYWSQKQGRGARPSDYRPTPAPGRRKAPIPDVTPNPGQGHQPD NGGYH 288 

25 Query: 295 WSKPAGNHATAPKHHAPTTKPINKDSALDKMLKRLYAQPLYARHVESDGLVYDPAQVNAF 354 
+ P N A+ KH + K ++L +L+ L RHVE DGL+++P QV 

Sbjct: 289 PAPPRPNDASQNKHQ RDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKS 344 

Query: 355 TAIGVSIPHGNHFHFIHYKDMSPLELE-ATRMVAEHRGHHIDALGKKDSTEKPKHISHEP 413 
30 AG +PHG+H+H I +SPLE+E A R +A G+ D + S 

Sbjct: 345 NAFGYWPHGDHYHI I PRSQLSPLEMELRDRYLA GQTDDNDSGSDHSKPS 394 

Query: 414 NKE- PHTEEEHHAVTPKDQRKGKP NSQI VYSAQEIEEAKK 452 

+KE HT H GKP + V+S + I K 

35 Sbjct:, 395 DKEVTHTFLGHRIKAYGKGLDGKPYDTSDAYVFSKESIHSVDK 437 

There is also homology to SEQ ID 1590. 

SEQ ID 1592 (GBS94) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 3; MW 52.5kDa). 

40 GBS94-His was purified as shown in Figure 194, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 498 

A DNA sequence (GBSx0536) was identified in S. agalactiae <SEQ ID 1593> which encodes the amino 
45 acid sequence <SEQ ID 1594>. This protein is predicted to be Lmb. Analysis of this protein sequence 
reveals the following: 

Possible site: 24 

>>> May be a lipoprotein 

50 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 There is also homology to SEQ IDs 1 596 and 5548. 
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A related GBS gene <SEQ ID 8583> and protein <SEQ ID 8584> were also identified. Analysis of this 
protein sequence reveals the following: 

Upop: Possible site: 22 Crend: 5 
McG : Discrim Score : 13 . 64 

5 GvH: Signal Score (-7.5): -5.75 

Possible site: 24 
>>> May be a lipoprotein 

ALOM program count: 0 value: 4.83 threshold: 0.0 
PERIPHERAL Likelihood =4.83 33 
10 modified ALOM score: -1.47 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8584 (GBS22) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 6; MW 35kDa). 

The GBS22-His fusion product was purified (Figure 94A; see also Figure 193, lane 4) and used to immunise 
20 mice (lane 2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 94B), FACS 
(Figure 94C ), and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

SEQ ID 8584 (GBS22) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 183 (lane 7 & 8; MW 35kDa). 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 499 

A DNA sequence (GBSx0537) was identified in S.agalactiae <SEQ ID 1597> which encodes the amino 
acid sequence <SEQ ID 1598>. Analysis of this protein sequence reveals the following: 

30 Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 19 - 35 ( 19 - 35) 

Final Results 

35 bacterial membrane Certainty=0. 1235 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:CAA51352 GB:X72832 0RF1 [Streptococcus equisimilis] 

Identities = 104/145 (71%) , Positives = 126/145 (86%) 

Query: 1 MKIIIQRvNQASVSIEDDWGSIEKGLVLLVGIAPEDTTEDIAYAVRKITSMRIFSDDEG 60 
MK+++QRV +ASVSI+ + G+I +GL+LLVG+ P+D ED+AYAVRKI +MRIFSD +G 
45 Sbjct: 1 MKLVLQRVTCEASVSIDGKIAGAINQGLLLLVGVGPDDAAEDLAYAVRKIVNMRIFSDADG 60 

Query: 61 KMNLSIQDIKGSVLSISQFTLFADTKKGNRPAFTGAADPVTCANQFYDIFNQELANHVSVE 120 

KMN SIQDIKGS+LS+SQFTL+ADTKKGNRPAFTGAA P A+QFYD FN++LA+ V VE 
Sbjct: 61 KMNQSIQDIKGSILSVSQFTLYADTKKGNRPAFTGAAKPDMASQFYDRFNEQLADFVPVE 120 



50 



Query: 121 TGQFGADMQVSLINDGPVTIVLDTK 145 

G FGADMQVSLINDGPVTI+LDTK 
Sbjct: 121 RGVFGADMQVSLINDGPVTI ILDTK 145 



WO 02/34771 



-590- 



PCT/GB01/04789 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1599> which encodes the amino acid 
sequence <SEQ ID 1600>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1430 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 103/145 (71%) , Positives = 124/145 (85%) 

15 Query: 1 MKI I IQRVNQASVSIEDDWGS IEKGLVLLVGI APEDTTEDIAYAVRKITSMRI FSDDEG 60 

MK+++QRV +ASVSI+ + G+I +GL+LLVG+ P+D ED+AYAVRKI +MRIFSD +G 
Sbjct: 1 MKLVLQRVKEASVSIDGKIAGAINQGLLLLVGVGPDDNAEDLAYAVRKIVNMRIFSDADG 60 

Query: 61 KMNLSIQDIKGSVIjSISQFTLFADTKKGlffiPAFTGAADPVKANQFYDIFNQELANHVSVE 120 
20 KMN SIQDIKGS+LS+SQFTL+ADTKKGNRPAFTGAA P A+Q YD FN++LA V VE 

Sbjct: 61 KMNQS IQD I KGS I LS VSQFTLYADTKKGNRPAFTGAAKPDLASQLYDS FNEQLAEFVPVE 120 

Query: 121 TGQFGADMQVSLINDGPVTIVLDTK 145 
G FGADMQVSLINDGPVTI+LDTK 
25 Sbjct: 121 RGVFGADMQ VSLINDGPVTI ILDTK 145 

SEQ ID 1598 (GBS368) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 4; MW 20kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 70 (lane A; MW 45kDa). 

30 GBS368-GST was purified as shown in Figure 215, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 500 

A DNA sequence (GBSx0538) was identified in S.agalactiae <SEQ ID 1601> which encodes the amino 
35 acid sequence <SEQ ID 1602>. This protein is predicted to be stringent response-like protein (rel) (relA). 
Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 60 - 76 ( 60 - 76) 

40 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA51353 GB:X72832 stringent response-like protein 
[Streptococcus equisimilis] 
Identities = 647/739 (87%) , Positives = 696/739 (93%) , Gaps = 1/739 (0%) 

50 

Query: 1 MVKEINLTGEEWAITSQYMSETDVAFVKFAIJSrrATAAHYYQARKSGEPYIIHPIQVAGI 60 

M KEINLTGEEWA+ ++YM+ETD AFVK AL+YATAAH+YQ RKSGEPYI+HPIQVAGI 
Sbjct: 1 MAKEINLTGEEWAIAAKYMNETDAAFVTCKALDYATAAHFYQTOKBGEPYIVHPIQVAGI 60 
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Query: 61 l^ADLHLDAVTVACGFLHDVVEDTEITLDEIETDFGKDVRDIIDGVTKLGKVEYKSHEEQL 120 

LADLHLDAVTVACGFLHDWEDT+ I TLD IE DFGKDVRD I +DGVTKLGKVE YKSHEEQL 
Sbjct: 61 LADLHLDAVTVACGFLHDVVEDTDITLDNIEFDFGKDWDIVDGVTKLGKVEYKSHEEQL 120 

Query: 121 AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 180 

AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 
Sbjct: 121 AENHRKMLMAMSKDIRVILVKLADRLHNmTLKHLRKDKQERISRETMEIYAPIAHRLGI 180 

Query: 181 SRIKWELEDLSFRYLNETEFYKISHMMSEKRREREELVDIIVDKIRSYTEEQGLYGDIYG 240 

SRIKWELEDL+FRYIiNETEFYKISHMM+EKRRERE LVD IV KI+SYT EQGL+GD+YG 
Sbjct: 181 SRI KWELEDLAFRYLNETEFYKI SHMMNEKRREREALVDDIVTKI KS YTTEQGLFGDVYG 240 

Query: 241 RPKHIYSIYRKMRDKKKRFDQIYDLIAIRCIMETASDVYAMVGYIHELWRPMPGRFKDYI 300 

RPKHIYSIYRKMRDKKKRFDQI+DLIAIRC+MET SDVYAMVGYIHELWRPMPGRFKDYI 
Sbjct: 241 RPKHI YS I YRKMRDKKKRFDQI FDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 300 

Query: 301 AAPKftNGYQSIHTTVYGPKGPIEIQIRTKEMHQVAEFGVAAHWAYKKGITSKVNQAEQSV 360 

AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMHQVAE+GVARHWAYKKG+ KVNQAEQ V 
Sbjct: 301 AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMHQVAEYGVAAHWAYKKGVRGKVNQAEQKV 360 

Query: 361 GMGWIQELVELQDESK-DAKDFVDSVKEDIFTERIYVFTPNGAVQELPRESGPIDFAYAI 419 

GM WI+ELVELQD S DA DFVDSVKEDI F+ERIYVFTP GAVQELP++SGPIDFAYAI 
Sbjct: 361 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKDSGPIDFAYAI 420 

Query: 420 HTQVGEKATGAKVNGRWPLTAKLKTGDVVEIITNPNSFGPSRDWIKIVKTNKARNKIRQ 479 

HTQVGEKA GAKVNGRMVPLTAKLKTGDWEI+TNPNSFGPSRDWIK+VKTNKARNKIRQ 
Sbjct: 421 HTQVGEKAIGAKVNGR1WPLTAKLKTGDVVEIVTOPNSFGPSRDWIKLVKTNKARNKIRQ 480 

Query: 480 FFKNQDKETSINKGRELLVDYFQEQGYVPNKYLDKKHIEEILPRVSVKSEEALYAAVGFG 539 

FFKNQDKE S+NKGR++LV YFQEQGYV NKYLDKK IE ILP+VSVKSEE+LYAAVGFG 
Sbjct: 481 FFKNQDKELSVNKGRDMLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 540 

Query: 540 DLSPISIEWKLTEKERREEERAKAKftEADELINGGEIRTDKRDVLKVKSENGVIIQGASG 599 
D+SP+S+FNKLTEKERREEERAKAKAE&+EL+NGGEIK + +DVLKV+SENGVI IQGASG 
1 Sbjct: 541 DISPVSVFNKLTEKERREEERAKAKAEAEELWGGEIKHENKDVLKVRSENGVI IQGASG 600 

Query: 600 LLMRIAKCCNPVPGDLIEGYITKGRGVAIHRSDCQNLKSQENYEQRLIDVEWDDDGSKKE 659 

LLMRIAKCCNPVPGD IEGYITKGRG+AIHR+DC N+KSQ+ Y++RLI+VEWD D S K+ 
Sbjct: 601 LLMRIAKCCNPVPGDPIEGYITKGRGIAIHRADCNNIKSQDGYQERLIEVEWDLDNSSKD 660 

Query: 660 Y^^IDIYG]^SGLIiNDVLQTLSNATKLVSTVNAQPTKI3MKFANIHVSFGISNLAQLTT 719 

Y AEIDIYGLNR GLLNDVLQ LSN+TK + STVNAQPTKDMKFANI HVS FGI NL LTT 
Sbjct: 661 YQAEIDIYGLNRRGIjIiNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 720 

Query: 720 WDKIKI I PDVYSVKRTNG 738 

W+KIK + PDVYSVKRTNG 
Sbjct: 721 WEKIKAVPDVYSVKRTNG 739 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1603> which encodes the amino acid 
sequence <SEQ ID 1604>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 64 - 80 ( 64 - 80) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1128 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



>GP:CAA51353 GB:X72832 stringent response-like protein 
[Streptococcus equisimilis] 
Identities = 700/739 (94%) , Positives = 721/739 (96%) 



Query: 5 MAKIMNVTGEEVIAIAATYMTKADVAFVAKAIiAYATAAHFYQVRKSGEPYIVHPIQVAGI 64 
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MAK +N+TGEEV+ALAA YM + D AFV KAL YATAAHFYQVRKSGEPYIVHPIQVAGI 
Sbjct: 1 MAKEINLTGEEWALAAKYl^TDAAFVKKALDYATAAHFYQWKSGEPYIVHPIQVAGI 60 

Query: 65 LADLHLDAVTVACGFLHDWEDTDITLDEIEADFGHDARDIVDGVTKLGEVEYKSHEEQL 124 
5 LADLHLDAVTVACGFLHDVVEDTDITLD IE DFG D RD I VDGVTKLG+VEYKSHEEQL 

Sbjct: 61 LADLHLDAVTVACGFLHDVVEDTDITLDNIEFDFGKDWDIVDGVTKLGKVEYKSHEEQL 120 

Query: 125 AENHRfML^IAMSKDIRVILVKLADRLHM^RTLKHLRKDKQERISRETMEIYAPLAHRLGI 184 
AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 
10 Sbjct: 121 AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 180 

Query: 185 SRI KWELEDLAFRYLNETEFYKI SHMMKEKRREREAIjVEAI VSKVKTYTTQQGLFGDVYG 244 

SRIKWELEDLAFRYLNETEFYKISHMM EKRREREALV+ IV+K+K+YTT+QGLFGDVYG 
Sbjct: 181 SRI KWELEDLAFRYLNETEFYKI SHMMNEKRREREALVDD IVTKI KS YTTEQGLFGDVYG 240 

15 

Query: 245 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 304 

RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 
Sbjct: 241 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 300 

20 Query: 305 AAPKANGYQSIHTTVYGPKGPIEIQIRTKDMHQVAEYGVAAHWAYKKGVRGKVNQAEQAV 364 

AAPKANGYQSIHTTVYGPKGPIEIQIRTK+MHQVAEYGVAAHWAYKKGVRGKVNQAEQ V 
Sbjct: 301 AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMHQVAEYGVAAHWAYKKGVRGKVNQAEQKV 360 

Query: 365 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKESGPIDFAYAI 424 
25 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPK+SGPIDFAYAI 

Sbjct: 361 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKDSGPIDFAYAI 420 

Query: 425 HTQIGEKATGAKVNGRMVPLTAKLKTGDVVEIITOANSFGPSRDWVKLVKTNKARNKIRQ 484 
HTQ+GEKA GAKVNGRMVPLTAKLKTGDWEI+TN NS FGPSRDW+ KLVKTNKARNKI RQ 
30 Sbjct: 421 HTQVGEKAIGAKVNGRWPLTAKLKTGDVVEIVTNPNSFGPSRDWIKLVKTNKARNKIRQ 480 

Query: 485 FFKNQDKELSTOKGRDLLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 544 

FFKNQDKELSVNKGRD+LVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 
Sbjct: 481 FFKNQDKELSVNKGRDMLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 540 

35 

Query: 545 DISPISVFNKLTEKERREEERAKAKAEAEELVKGGEVKHENKDVLKVRSENGVIIQGASG 604 

DISP+SVFNKLTEKERREEERAKAKAEAEELV GGE+KHENKDVLKVRSENGVI IQGASG 
Sbjct: 541 DI S PVSVFNKLTEKERREEERAKAKAEAEELVNGGE IKHENKDVLKVRSENGVI I QGASG 600 

40 Query: 605 LLMRIAKCCNPVPGDPIDGYITKGRGIAIHRSDCHNIKSQDGYQERLIEVEWDLDNSSKD 664 

LLMRIAKCCNPVPGDPI+GYITKGRGIAIHR+DC+NIKSQDGYQERLIEVEWDLDNSSKD 
Sbjct: 601 LLMRIAKCCNPVPGDPIEGYITKGRGIAIHRADCNNIKSQDGYQERLIEVEWDLDNSSKD 660 

Query: 665 YQAEIDIYGITOSGLIiNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 724 
45 YQAEIDIYGLNR GLLNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 

Sbjct: 661 YQAEIDIYGI^RGLLNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 720 

Query: 725 WEKI KAVPDVYSVKRTNG 743 
WEKIKAVPDVYSVKRTNG 
50 Sbjct: 721 WEKI KAVPDVYSVKRTNG 739 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 635/739 (85%) , Positives = 691/739 (92%) , Gaps = 1/739 (0%) 

55 Query: 1 MVKEINLTGEEWAITSQYMSETDVAFVKFALNYATAAHYYQARKSGEPYIIHPIQVAGI 60 

M K +N+TGEEV+A+ + YM++ DVAFV AL YATAAH+YQ RKSGEPYI+HPIQVAGI 
Sbjct: 5 MAKIMNVTGEEVIALAATYMTKADVAFVAKAIAYATAAHFYQVRKSGEPYIVHPIQVAGI 64 

Query: 61 LADLHLDAVTVACGFLHDVVEDTEITLDEIETDFGKDVRDIIDGVTKLGKVEYKSHEEQL 120 
60 LADLHLDAVTVACGFLHDWEDT+ITLDEIE DFG D RDI+DGVTKLG+VEYKSHEEQL 

Sbjct: 65 LADLHLDAVTVACGFLHDVVEDTDITLDEIEADFGHDARDIVDGVTKLGEVEYKSHEEQL 124 

Query: 121 AEIOTRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPIiAHRLGI 180 
AEI^RKMLMAMSKDIRVILVKLADRLHN^TLKHLRKDKQERISRET^IYAPLAHRLGI 
65 Sbjct: 125 AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 184 



Query: 181 SRIKWELEDLSFRYLNETEFYKISHMMSEKRREREELVDIIVDKIRSYTEEQGLYGDIYG 240 
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SRIKWELEDL+FRYLNETEFYKISHMM EKRRERE LV+ IV K+++YT +QGL+GD+YG 



Sbjct: 


185 


SRIK^LEDIAFRYIJSIETEFYKISHMMKEKRREREAIjVEAIVSKVKTYTTQQGLFGDVYG 


244 


Query: 


241 




300 






RPKHIYSIYRKMRDKKKRFDQI+DLIAIRC+MET sdvyamvgyihelwrpmpgrfkdyi 




Sbjct: 


245 


rpkhiysiyrkmrdkkkrfdqifdliaircvmetqsdvyamvgyihelwrpmpgrfkdyi 


304 


Query: 


■jUJ. 




jDU 






AAPKANGYQSIHTTVYGPKGPIEIQIRTK+MHQVAE+GVAAHWAYKKG+ KVNQAEQ+V 




Sb j ct : 


305 


AAPKANGYQSIHTTWGPKGPIEIQIRTKDI^QV^ 


364 


Query: 


361 


0l v l0VV X^-CjIj V r , 1 it^J 1 JPin r\ - 1 JH r\l IP vlJo V ISJLU.L I? 1 iilvX I VT1 Jrl>Jl^LO.Vy£liJUJriVj2tOOir JLUITxTLifiJ. 


419 






GM WI+ELVELQD S DA DFVDSVKEDIF+ERIYVFTP GAVQELP+ESGPIDFAYAI 




Sb j Ct : 


365 


GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKESGPIDFAYAI 


424 


Query: 


d.90 










htq+gekatgakvngrmvpltaklktgdweiitn nsfgpsrdw+k+vktnkarnkirq 




Sbjct: 


425 


htqigekatgakvngrwpltaklktgdvveiitnansfgpsrdwvklvktnkarnkirq 


484 


Query: 


^to u 


r r iSJ/J^iJlvcj 1 o J,1N JA.or\.cj±j J-i V J_J j. r VJrrJlS.1 J_iJ_JiviSX1 J.HjH± J_i±r K. vo VI^olLI^i±jiriri.voru 








FFKNQDKE S+NKGR+LLV YFQEQGYV NKYLDKK IE ILP+VSVKSEE+LYAAVGFG 




Sb j ct : 


485 


FFKNQDKELSVNKGRDLLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 


544 


Query: 


t^A n 

D*£ u 










D+SPIS+FNKLTEKERREEERAKAKAEA+EL+ GGE+K + +DVLKV+ SENGVI I QGASG 




Sbjct: 


545 


D I S PI S VFNKLTEKERREEERAKAKAEAEELVKGGEVKHENKDVLKVRSENGVI I QGASG 


604 


Query : 


ouu 




659 






LLMRIAKCCNPVPGD I +GYITKGRG+AIHRSDC N+KSQ+ Y++RLI+VEWD D S K+ 




Sb j ct : 


605 


LLMRIAKCCNPVPGDPIDGYITKGRGIAIHRSDCHNIKSQDGYQERLIEVEWDLDNSSKD 


664 


Query: 


660 


YNlAEIDIYGIM^SGLLNDVLQTLSNATKLVSTVI^QPTKDMKFANIHVSFGISNIiAQLTT 


719 






Y AEIDIYGLNRSGLLNDVLQ LSN+TK +STVNAQPTRDMKFANIHVSFGI NL LTT 




Sb j ct : 


665 


YQAEIDIYGI^SGLLNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 


724 


Query: 


720 


WDKI KI I PDVYSVKRTNG 73 8 








W+KIK + PDVYSVKRTNG 




Sb j ct : 


725 


WEKI KAVPDVYSVKRTNG 743 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 501 

A DNA sequence (GBSx0539) was identified in S.agalactiae <SEQ ID 1605> which encodes the amino 
acid sequence <SEQ ID 1606>. This protein is predicted to be 2',3'-cyclic-nucleotide 2-phosphodiesterase 
precursor (cpdB). Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.79 Transmembrane 779 - 795 ( 778 - 797) 



Certainty=0. 3314 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12613 GB:Z99108 similar to 2 ' , 3 1 -cyclic-nucleotide 
2 ' -phosphodiesterase [Bacillus subtilis] 
Identities = 297/630 (47%) , Positives = 419/630 (66%) , Gaps = 21/630 (3%) 

Query: 102 KVDLRIMSTTDLHTNLVNYDYYQDKESQKIGIAKTAVLIEEAKKENPNTVLTO 161 
+V L I++TTD+H N+++YDYY DKE+ GLA+TA LI++ +++NPNT+LVDNGD+IQG 



Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 
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Sbjct: 42 QVHLSILATTDIHANMMDYDYYSDKKTADFGLARTAQLIQKHREQNPNTLLVDNGDLIQG 101 

Query: 162 TPLGTYKAIVKP VAENEEHP^QAMNALGYDASTLGNHEENYGLDYLKKIIATANLP 218 

PLG Y + ++ + HP+ MNAL YDA TLGNHEFNYGLD+L I A+ P 
Sbjct: 102 NPLGKYAVKYQKDDIISGTKTHPIISVMNALKYDAGTLGNHEFNYGLDFLDGTIKGADFP 161 

Query: 219 ILNAWLDFKTHQPVFKTYDIITKTFKDSTGRAVA1MIGITGIVPPQIRJWDKANLEGKV 278 

I+NANV +++YIKTDG ++GG VPPQI+ WDK NLEG+V 
Sbjct: 162 IVNANVKT-TSGEI^YTPYVINEKTLIDENGNEQKVKVGYIGFVPPQIMTWDKKNLEGQV 220 

Query: 279 IVKDSVKAIEEIVPTMRAKGADVILVLSHSGIGDDRYEEGEENVGYQIAS-IKGVDAWT 337 

V+D V++ E +P M+A+GADVI+ L+H+GI G EN + +A+ KG+DA+++ 

Sbjct: 221 QVQDIVESANETIPKMKAEGADVIIALAHTGIEKQAQSSGAENAVFDLATKTKGIDA1IS 280 

Query: 338 GHSHAEFPSGNGTGFYEKYTGVDGIN GKINGTPVTMAGKYGDHLGIIDLGLSYTNGK 394 

GH H FPS +Y GV N G ING PV M +G +LG+IDL L +G 

Sbjct: 281 GHQHGLFPSA EYAGVAQFNVEKGTINGIPWMPSSWGKYLGVIDLKLEKADGS 333 

Query: 395 WQVSESSAKIRKIDMNSTTADERIIALAKEAHDGTIWYVRQQVGTrTAPITSYFAIjVKDD 454 

W+V++S I I N T+ +E + ++ H T+ YVR+ VG T A I S+FA VKDD 
Sbjct: 334 WKVADSKGSIESIAGNVTSRKETVTNTIQQTHQNTLEYWKPVGKTFADINSFFAQVKDD 393 

Query: 455 PSVQIVNNAQRWYVANELKGTPEANLPLLSAAAPFKAGTRGDATAYTDIPAGPVAIKNVA 514 

PS+QIV +AQ+WY E+K T NLP+LSA APFKAG R A YT+IPAG +AIKNV 
Sbjct: 394 PSIQIVTDAQKWYAEKEMKDTEYKNLPILSAGAPFKAGGRNGAMYYTNIPAGDIAIKWG 453 

Query: 515 DLYLYDNVTALLKVTGADLREWLEMSAGQFNQIDPNNKAPQNIINTEYRTYNFDVIDGLT 574 

DLYLYDN ++K+TG+++++WLEMSAGQFNQIDP Q ++N +R+YNFDVIDG+T 

Sbjct: 454 DLYLYDNTVQIVKLTGSEVKDWLEMSAGQFNQIDPAKGGDQALIJirENFRSYNFDVIDGVT 513 

Query: 575 YKFDITQPNKYNKDGKWNSQASRVRDLMYNGKPVADKQEFMIVTNNYRASGTFPGAKNA 634 

Y+ D+T+P KYN++GKV+N+ +SR+ +L Y GKP++ QEF++VTNNYRASG G + 
Sbjct: 514 YQVDVTKPAKYNENGKVINADSSRIINLSYEGKPISPSQEFLVVTNNYRASGG-GGFPHL 572 

Query: 635 TMNRLIiN LENRQTIINYIISEKTINPTADNNWGFTESIKDLDLRFQTADKAKNLVTN 691 

T +++++ +ENRQ +++YII +KT+NP ADNNW + +L F+++ AK 

Sbjct: 573 TSDKIVHGSAVENRQVLMDYIIEQKTVNPKADNNWSIA-PVSGTNLTFESSLLAKPFADK 631 

Query: 692 SKDIQYIASSTKDEGFGDYRFVYTEQEKVD 721 

+ D+ Y+ S +EG+G Y+ + + D 
Sbjct: 632 ADDVAYVGKSA-NEGYGVYKLQFDDDSNPD 660 
Identities = 133/567 (23%), Positives = 214/567 (37%), Gaps = 147/567 (25%) 

Query: 104 DLRIMSTTDLHTKLWYDYYQDKESQKIGIAKTAVLIEFAKKENPNTVLVDNGDVIQGTP 163 

DL +M T D H +L + A+ I E + E + +L+D GDV G 

Sbjct: 668 DLTVMHTNDTHAHLDD AARRMTKINE VRSETNHNILLDAGDVFSGD - 713 

Query: 164 LGTYKAIVKPVAENEEHPMYQAMNALGYDASTLGNHEFNYG LDYLKKI IATAN 216 

Y +A+ + MN +GYDA T GNHEF+ G D+L AT + 

Sbjct: 714 --LYFTKWNGLAD LKMMNM^GYDAMTFGNHEFDKGPTVLSDFLSGNSATVDPAN 765 

Query: 217 LPIIjNANVLDFKTHQPVFKTYDIITKTF KDSTGRAVALNIGITG- -IV 262 

PI++ANV +++P K++ +TF K G + + + G + 
Sbjct: 766 RYHFFAPEFPIVSANV--DVSNEPKLKSFVKKPQTFTAGEKKEAGIHPYILLDVDGEKVA 823 

Query: 263 PPQILNWDKANLE--GKVIV KDSVKAIEEIVPTMRAKGADVILVLSHSGIGD 312 

+ DA GK IV +++VKAI+E + + 1+ L+H G 
Sbjct: 824 VFGLTTEDTATTSSPGKSIVFNDAFETAQNTVKAIQE EEKVNKIIALTHIG 874 

Query: 313 DRYEEGEENVGYQIA-SIKGVDAWTGHSHAEFPSGNGTGFYEKYTGVDGINGKINGTP- 370 

W ++A +KG+D ++ GH+H T VD + N P 

Sbjct: 875 HNRDLELAKKVKGIDLI IGGHTH TLVDKMEWNNEEPT 912 

Query: 371 -VTMAGKYGDHLGIIDLGLSYTNGKWQVSESSAKIRKID^lNSTTADERIIAIlAKEAHDGT 429 

V A +YG LG +D+ G Q +S+ + ID ++ E AK+ D 

Sbjct: 913 IVAQAKEYGQFLGRVDVAFD-EKGWQTDKSNLSVLPIDEHTEENPE AKQELDQF 966 

Query: 430 INYV RQQVGTTTAPITSYFALVKDDPSVQIVNNAQRWYVANELKGTPEANLPLLSA 485 
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N + ++VG T + + QR +V + + A 

Sbjct: 967 KNELEDVKNEKVGYT DVALDGQREHVRTKETNLGNF I ADGMLA 1009 

Query: 486 AAPFKAGTRGDAT AYTD I PAGPVAI KNVADLYLYDNVTALLKVTGADLREWLEMSA 541 

A AGR T IG + +V+++N + +TG ++E LE 
Sbjct: 1010 KAKEAAGARIAITNGGGIRAGIDKGDITLGEVIiNVMPFGNTLYVADLTGKQIKEALE 1066 

Query: 542 GQFNQIDPNNKAPQNIINTEYRTYNFDVIDGLTYKFDITQPNKVNKDGKWNSQASRVRD 601 

Q + NE F+G+YF+ NKG + V+ 

Sbjct: 1067 CGLSNVENGGGAFPQVAGIEYTFTLN NKPG HRVLEVKI 1104 

Query: 602 LMYNGKPVADKQE- - FMIVTNNYRASG 626 

NG VA + + + TNN+ +G 
Sbjct: 1105 ESPNGDKVAINTDDTYRVATNNFVGAG 1131 

There is also homology to SEQ ID 1608. A related sequence was also identified in GAS <SEQ ID 9129> 
which encodes the amino acid sequence <SEQ ID 9130>. Analysis of this protein sequence reveals the 
following: 

Possible cleavage site: 27 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.67 Transmembrane 649 - 665 ( 648 - 666) 
INTEGRAL Likelihood = -2.02 Transmembrane 6 - 22 ( 5 - 22) 
PERIPHERAL Likelihood = 1.85 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8585> and protein <SEQ ID 8586> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 6.68 
GvH: Signal Score (-7.5): 0.87 

Possible site: 28 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -5.79 threshold: 0.0 

INTEGRAL Likelihood = -5.79 Transmembrane 779 - 795 ( 778 - 797) 
PERIPHERAL Likelihood = 0.53 251 
modified ALOM score: 1.66 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .3314 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 769-773 

The protein has homology with the following sequences in the databases: 

ORF01378(298 - 2337 of 3000) 

GP| 6782402|emb|CAB70615.l| |AJ133440 (1 - 680 of 683) cyclo-nucleotide phosphodiesterase, 
putative {Strept 

ococcus dysgalactiae subsp. equisimilis} 
%Match = 38.3 

%Identity =59.0 %Similarity =72.3 

Matches = 403 Mismatches = 181 Conservative Sub.s = 91 

105 135 165 195 225 255 285 315 

LFYHFLT* K* KKLEAQKELXTK*MCLTKLSFINKRLFLV* SLKI IRK*D* LNVFNKL* * FL *DNIHVMF*WRRFMSKHY 

hi I 
MMTKGY 
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345 375 405 435 465 495 525 555 

FSKSVFALTVLTATATSGLAVQAEDIVTTPSSTSTKVESTTPTSTIAEEKSNVTSTPTAITDASTATS1^TT¥^NPQP 

III I =1 I := | :||: |:| :|| |: || III | \f 7 : = 

MSKSAIFLAMLVAAGSAQLT-KAEETTAVEPLTTT-ANTTTSTAVPAETAPLVADTTPATATADTAVPSPVNSTSSE-MA 
20 30 40 50 60 70 80 



585 615 645 675 705 735 765 795 

VATEATTSDLKPIEGEKVDLRIMSTTDIiH^ 

1= = 1=11= ll=ll=llllll=llllllllllll=l =111111111= lllll I 111111-1:11 
TASADNATOTAPVEGQSVDTOILSTTDLHSNLVNYDYYQDKEAQS 

100 110 120 130 140 150 160 

825 855 885 915 945 975 1005 1035 

TYKAIVKPVAENEEHPNTYQAMNALGYDASTLGNHEFNYGL^ 

mill ii =i mi i= ii =iiiiiiiiiiiiiiiii -in i i i mi n =i ii =iniin 

TYKAIVDPVEADEVHPMYAALKAMFDASTLGIfflEFNYGL^ 

180 190 200 210 220 230 240 



1065 1095 1125 1155 1185 1215 1243 1273 

DSTGRAVALNIGITGIVPPQILNWDKANLEGKVIVKDSVKAIEEIVPTMRAKGADVILVLS TLALEMIDMKKVKKTLAI 



DKDGKTVSLKIGITGWPPQIMSWDKANLTGKVIVKDAVEAVKEVIPTIRAAGADLVLVIA! TLVSVMTSMKSVKKMLVT 
260 270 280 290 300 310 320 



1303 1357 1387 1416 1446 1476 1500 

KLPASREWMPLLRDTHTL - -NFHQVTVIASMKNTLELMVSM KINGTPVTMAGKYGDHLGI IDLGLSYTNGKWQVS - - ES 

ii i = i = i i =i= i= i = == in mi ninnnni ninnnn =i 

KLLALKVLMQWSQAIHMLISQPYQMAVFTITSKVLMVKRAL ! - INGVPVTMGGKYGDHLGLIDLNLTYTNGQWKVNKDQS 
340 350 360 370 380 390 400 



1530 1560 1590 1620 1650 1680 1710 1740 

SAKIRKIDMNSTTADERIIALAKEAHDGTINYTOQQVQTTTAPIT^^ 

i= mi i i niiniinih iinmiiiin inihiinnin nun n iniiii 

RAETRQIDSKSNQVDPTIIALAKEAHDGWAYVRQQVGTTTAPINSYFALIKDDPSIQIVNN^ 

410 420 430 440 450 460 470 480 

1770 1800 1830 1860 1887 1917 1947 1977 

PLLSAAAPFKAGTRGDATAYTDIPAGPVAI KNVADLYLYDNVTAL - LKVTGADLREWLEMSAGQFNQIDPNNKAPQNI IN 

iiniiinin I == == = =1 I i I I I I I I : .11 |||= ll = = = l 

PLLSAAAPFKAGYTKmRQLILIFLLVQSLSKMSLTFTCTTTSIiLFLKOTGADLKEWLEMSAGQFNTIDPSKSEPQDLVN 
490 500 510 520 530 540 550 560 

2007 2037 2067 2097 2127 2157 2187 2217 

TEYRTYNFDVIDGLTYKFDITQPNKYNKDGKVVNSQASRVRDL^TYNGKPVADKQEFMIVTKNYRASGTFPG 

I 11111111111111 = 11 = 11 111= I =11 III 11 = 1 Ml = 11111 = 111111 III III 1111 = 111 
TSYRTYNFDVIDGLTYEFDTOQKNKYDSKGNLWPDASRV 

570 580 590 600 610 620 > 630 640 

2247 2277 2307 2337 2367 2397 2427 2457 

LNLEISTRQTIINYIISEKTINPTADNNWGFTESIKD 

mini nimminmnii i = = n 1 = 111 

LNLENRQAI INYI VAEKT INPSADNNWYFADTI KGLNLRFLKR 
650 660 670 680 

SEQ ID 8586 (GBS53) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 196, lane 9. 

Example 502 

A DNA sequence (GBSx0540) was identified in S.agalactiae <SEQ ID 1609> which encodes the amino 
acid sequence <SEQ ID 1610>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N- terminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 02 96 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 503 

A DNA sequence (GBSx0541) was identified in S.agalactiae <SEQ ID 161 1> which encodes the amino 
acid sequence <SEQ ID 1612>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 10195> which encodes amino acid sequence <SEQ ID 
101 96> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12860 GB:Z99109 similar to glucanase [Bacillus subtilis] 
Identities = 212/345 (61%) , Positives = 268/345 (77%) , Gaps = 1/345 (0%) 



Query: 


30 


SMETTLNYIKTLTS I PSPTGFTQTIMTYI IKELEAFGYSPIRTNKGGVMVSLKGKNDTKH 


89 






S+ T+ IK L SIPSPTG T ++ YI L+ + +R +KGG++ +L G++ ++H 




Sb j ct : 


3 


SWKTMELIKELVSIPSPTGNTYEVINYIESLLKEWKVETVRNHKGGLIATLPGRDTSRH 


62 


Query: 


90 


RMITAHLDTLGAMVRAIKPDGRLKIDLIGGYTYNAIEGENCTIHLSKNGKEISGTALIHQ 


149 






RM+TAH+DTLGAMV+ IK DGRLKIDLIGG+ YN+IEGE C I + +GK +GT L+HQ 




Sb j ct : 


63 


RMLTAHVDTLGAMVKEIKADGRLKIDLIGGFRYNSIEGEYCQIETA-SGKMYTGTILMHQ 


121 


Query: 


150 


TSvHVYKDAGTAERNQTNMEIRLDEKVTTADETRALGIQVGDFISFDPRTIITDSGFIKS 


209 






TSVHVYKDAG AERNQ NMEIRLDE V +T LGI VGDF+SFDPR IT SGFIKS 




Sb j ct : 


122 


TSVHVYKDAGKAERNQENMEIRLDEPVHCRKDTEELGIGVGDFVSFDPRVEITSSGFIKS 


181 


Query: 


210 


RYLDDKVSAGILMELLSWKKEDIQLPYTTHFYFSAFEELGHGANSSIPNETVEYLAVDM 


269 






R+LDDK S +L+ L+ + EDI+LPYTTHF S EE+G+G NS+IP ETVEYLAVDM 




Sb j ct : 


182 


RHLDDKASVALLLRLIHEIQTEDIELPYTTHFIiISNNEEIGYGGNSNIPPETVEYLAVDM 


241 


Query: 


270 


GAMGDDQETDEYTVSICVKDASGPYHYELRQHLVSLAENNNIPYKLDIYPYYGSDASAAM 


329 






GA+GD Q TDEY+VS I CVKDASGPYHY+LR+HLV LAE ++I YKLDIYPYYGSDASAA+ 




Sbjct: 


242 


GAIGDGQATDEYSVSICVKDASGPYHYQLRKHLVQLAEKHHIDYKLDIYPYYGSDASAAI 


301 


Query: 


330 


RAGAEVKHALLGAGIESSHSYERTHIDSIQATELLVDAYLKSNMV 374 








++G ++ H L+G GI++SH++ERTH S++ T L+ Y++S MV 




Sbjct: 


302 


KSGHDIWGIiIGPGIDASHAFERTHKSSLRHTAKLLYYYVQSPMV 346 





There is also homology to SEQ ID 424. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Final Results 



bacterial cytoplasm Certainty=0 . 1504 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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Example 504 

A DNA sequence (GBSx0542) was identified in S.agalactiae <SEQ ID 1613> which encodes the amino 
acid sequence <SEQ ID 1614>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3157 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



15 



>GP:AAF11472 GB:AE002031 conserved hypothetical protein [Deinococcus radiodurans] 
Identities = 55/150 (36%) , Positives = 85/150 (56%) , Gaps = 2/150 (1%) 

Query: 5 LI I IRGNSASGKSTIAKQLQAELGENTLLLSQDYLRREMLGTKDGENTTTI PLLINLLNY 64 

LI++RGNS SGKS++A+ L+ G + QDYLRR +L D I L+ + Y 

Sbjct: 23 LIVLRGNSGSGKSSVARALRERFGYGIAWVEQDYLRRVLLREHDVAGGKNIGLIETNVRY 82 

20 Query: 65 GYHNCSYI I LEG I LRSDWYTP VWKHI LKHNPNNTYAYYYDLS FQETVKRHSTRLKSLEFG 124 

S +LEGIL S Y P+ + + H + +Y+DL F+ETV+RH+TR ++ +FG 

Sbjct: 83 CLSAGSVTVLEGILFSRHYGPMLERL--HADFGGHWFYFDLPFEETVRRHATRPQAADFG 140 

Query: 125 EDSLARWWLEKDFLKEIPEKILTKAMSLED 154 
25 + W+ +D L + E+++ A SL D 

Sbjct: 141, VQDMQAWFQARDVLPFVQEQLIGPASSLAD 170 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 505 

A DNA sequence (GBSx0543) was identified in S.agalactiae <SEQ ID 1615> which encodes the amino 
acid sequence <SEQ ID 1616>. This protein is predicted to be periplasmic-iron-binding protein BitC. 
Analysis of this protein sequence reveals the following: 

35 Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.46 Transmembrane 9 - 25 ( 5 - 30) 

Final Results 

40 bacterial membrane Certainty=0 . 5585 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:AAD18094 GB:U75349 periplasmic- iron-binding protein BitA 

[Brachyspira hyodysenteriae] (ver 2) 
Identities = 114/331 (34%), Positives = 184/331 (55%), Gaps = 3/331 (0%) 

Query: 11 YILLWSIIFISVFTYSISQPSKLLPPKELVTLSPNSQAILTGTIPAFEEKY-GIKVKLI 69 
50 +1+ + ++ S SK LVI + ++ + F+ K I V+++ 

Sbjct: 4 FI IFCMLMLSMTLFYSCSSGDSK- -NANSLVIYCSHPLDLMNTILDDFKAKNPDINVE W 61 

Query: 70 QGGTGQLIDRLSKEGKQLKADI FFGGNYTQFESHKALFESYVSKNVHTVI PDYIHPSDTA 129 
GTG+L+ R+ E D+ +GG +S LFE+Y S N ++ ++ + 

55 Sbjct: 62 TAGTGELLKRVEAEKMNPLGDVLWGGTI^SVKSKTDLFENYTSTNEANILDEFKNTEGPF 121 
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Query: 130 TPYTINGSVLIVN1TOLAKGLTIKSYEDLLQPSLKGKIAFADPNTSSSAFSQLTNIL1AKG 189 

T ++ S+L+VN LA + 1+ YEDLL P LKGKIA ADP+ SSSAF L N+L A G 
Sbjct: 122 TRFSAIPSILMVWTNIiAGNIKIEGYEDLLNPELKGKIAAADPSASSSAFEHLVNMLYAMG 181 

5 Query: 190 GYTNPKAWNWKKIQHNINAIKSSSSSF^QSVAEGKMIVGLTYEDPSWLQKSGANVSI 249 

K W+YV+KL N++ S SS VY+ VA+G+ VGLTYE+P ++ SG+ V + 
Sbjct: 182 KGDPEKGWDYVQKLCANLDGKLLSGSSAVYKGVADGEYTVGLTYEEPGISYMSSGSPVKV 241 

Query: 250 VYPTEGTVFVPSSVAIIKNAPSMKEAKLFINFMLSLDVQNAFGQSTSNRPIRKDAQTSNG 309 
10 +Y EG + P V IIK +++ AK FI++ +SLD QN + S E IE DA ++ 

Sbjct: 242 IYMKEGVISKPDGVYIIKGGKNLENAKKFIDYCVSLDAQNMLVEKLSRRSIRSDAWTDM 301 

Query: 310 MKMiKDIATLKEDYRYVTKHKGQILKTYNRI 340 
+K + +1 ++ ++ V + + + L + I 
15 Sbjct: 302 VKPMSEIYSITDNADWEESRQKWLDKFKDI 332 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1617> which encodes the amino acid 
sequence <SEQ ID 1618>. Analysis of this protein sequence reveals the following: 

Possible site: 33 



20 



60 



>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.16 Transmembrane 9 - 25 ( 4 - 33) 



Final Results 

25 bacterial membrane Certainty=0 . 6265 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

30 >GP:AAB95371 GB:U75349 periplasmic-iron-binding protein BitC 

[Brachyspira hyodysenteriae] 
Identities = 115/324 (35%) , Positives = 177/324 (54%) , Gaps = 8/324 (2%) 

Query: 15 VIIILAIVNVAMYIF SSSKKDSAKELVILTPNSQTILTGTIPAFEEKY-GVKVRL 68 

35 +++I + ++++IF S S S LVI P+ + + F+ K G+ V + 

Sbjct: 4 IVLIFTSLLLSVFIFYSCSSSESGAQSGNSLVIYCPHPLEFINPLVDDFKAKNPGINVDI 63 

Query: 69 IQGGTGQLIDQL-GRKDKPLNADIFFGGNYTQFESHKDLFESYVSPQVSTVISDYQLPSH 127 
I GTG+L+ ++ KD PL DI +GG + + DLFESY S + Y+ 

40 Sbjct: 64 IAAGTGELLKRVESEKDNPLG-DILWGGTISMAKPKIDLFESYTSTNEENIAEIYKNTEG 122 

Query: 128 RATPYTINGSVLIVNNELARGLHITSYEDLLQPALKGKIAFADPNSSSSAFSQLTNILLA 187 

T T S+L+VN LA + I YEDLL P LKGKIAFADP++SSS+F L N+L A 
Sbjct: 123 ALTRCTAVPSILMVNTNLAGDIKIEGYEDLLNPELKGKIAFADPSASSSSFEHLVNMLYA 182 

45 

Query: 188 KGGYTNADAWAYMKRLLVNMNSIRATSSSEVYQSVAEGKMIVGLTYEDPCINLQKSGANV 247 

G W Y+ +L N++ + SS VY+ VA+G+ VGLT+E+ N +G+ V 

Sbjct: 183 IGKGDPEKGWDYVSKLCANLDGKLLSGSSAVYKGVADGEYTVGLTFEEGGANYVSAGSPV 242 

50 Query: 248 SIVYPKEGTVFVPSSVAIIKHAPNMTEAKLFINFMLSRDVQNAFGQSTSNRPIRQDAQTS 307 

+VY KEG + P + IIK+A N+ AK F+++ SDQ +R+RD S 

Sbjct: 243 KL VYMKEGVI IKPDGI YI I KNAKNLENAKKFVDYATSYDAQKTITDKLNRRSVRGDLPPS 302 

Query: 308 HDMKALETIATLKEDYAYVTKHKK 331 
55 +++++TI + +D A V ++K+ 

Sbjct: 303 AILQSVDTINVITDDEAWDQNKQ 326 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 257/345 (74%) , Positives = 295/345 (85%) , Gaps = 1/345 (0%) 

Query: 1 MKEKQSKRLIYILLWSIIFISVFTYSISQPSKLLPPKELVILSPNSQAILTGTIPAFEE 60 

+K K+ L ++L+++ + ++V Y S SK KELVIL+PNSQ ILTGTIPAFEE 
Sbjct: 2 LKLKRKWLLSFLLVI 1 1 LAI VNVAMYIFSS - SKKDSAKELVI LTPNSQT I LTGTI PAFEE 60 
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Query: 


61 


KYGIKVKLIQGGTGQLIDRLSKEGKQLKADIFFGGNYTQFESHKALFESYVSKNVHTVIP 


120 






KYG+KV+LIQGGTGQLID+L ++ K L ADIFFGGNYTQFESHK LFESYVS V TVT 




Sb j ct : 


61 


KYGVKVRLIQGGTGQLIDQLGRKDKPnNftDIFFGGNYTQFESHKDLFESYVSPQVSTVIS 


120 


Query: 


121 


DYIHPSDTATPYTINGSVLIVNNEIAKGLTIKSYEDLLQPSLKGKIAFADPNTSSSAFSQ 


180 






DY PS ATPYTINGSVLIVNNELA+GL I SYEDLLQP+LKGKIAFADPN+SSSAFSQ 




Sb j ct : 


121 


DYQIiPSHRATPYTINGSVLIVNNELARGLHITSYEDLLQPALKGKIAFADPNSSSSAFSQ 


180 


Query: 


181 


LTNILl^KGGYTNPKAWNYVKKLQHNINAIKSSSSSEVYQSVAEGKMIVGLTYEDPSVNL 


240 






LTNILLAKGGYTN AW Y+K+L M+N+I+++SSSEVYQSVAEGKMIVGLTYEDP +NL 




Sbjct: 


181 


LTNILIAKGGYTNADAWAYMKRLLVN^SIRATSSSEVYQSVAEGKMIVGLTYEDPCINL 


240 


Query: 


241 


QKSGANVSIVYPTEGTVFVPSSVAIIKNAPSMKFAKLFINFMLSLDVQNAFGQSTSNRPI 


300 






QKSGANVSIVYP EGTVFVPSSVAIIK+AP+M EAKLFINFMLS DVQNAFGQSTSNRP I 




Sb j ct : 


241 


QKSGANVSIVYPKEGTVFVPSSVAIIKHAPNMTEAKLFINFMLSRDVQNAFGQSTSNRPI 


300 


Query: 


301 


RKDAQTSNGMKALKDIATLKEDYRYVTKHKGQILKTYNRIRRNAD 345 








R+DAQTS+ MKAL+ IATLKEDY YVTKHK +1+ TYN++R+ + 




Sb j ct : 


301 


RQDAQTSHDMKALETIATLKEDYAYVTKHKKKI VATYNQLRQRLE 345 





20 

SEQ ID 1616 (GBS263) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 4; MW 63kDa). 

The GBS263-GST fusion product was purified (Figure 205, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 301), which confirmed that the protein is irnmunoaccessible 
25 on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 506 

A DNA sequence (GBSx0544) was identified in S.agalactiae <SEQ ID 1619> which encodes the amino 
30 acid sequence <SEQ ID 1620>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 4733 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF31452 GB:AF221126 putative response regulator [Streptococcus pneumoniae] 
Identities = 85/252 (33%), Positives = 147/252 (57%), Gaps = 17/252 (6%) 

Query: 2 YRLLIVEDEHLIRKWLRYAIDYQSLNILWGEAKDGKEGAQLIQEEQPDIVLSDINMPIM 61 
45 Y +LIVEDE+L+R+ L ++ + ++ ++G+A++G++ +LIQ++ PDI+L+DINMP + 

Sbjct: 3 YTILI vEDEYLWQGLTKLVNVAAYDMEIIGQAENGRQAWELIQKQVPDIILTDINMPHL 62 



Query: 62 TAFDMFEATKGQSYAK IILSGYADFPNAQSAIHYGVLEFLTKPLEKQALIDCLKTIM 118 

+ + ++Y + + L+GY DF A SA+ GV ++L KP +Q + + L I 
50 Sbjct: 63 NGIQLASLVR-ETYPQVHLVFLTGYDDFDYALSAVKLGVDDYLLKPFSRQDIEEMLGKIK 121 



55 



Query: 119 ARIE-EHKEKHLQEHTELYLPLPQANDQVPEVIKDMLAWIHSHFHGKIVISQLAHDLGYS 177 

+++ E KE+ LQ+ L+ ++I+LA ++LA DLG+S 

Sbjct: 122 QKLDKEEKEEQLQD LLTNRFEGNMAQKIQSHLA DSQFSLKSLASDLGFS 170 

Query: 178 ESYLYTVTKKHLHITLSDYINQYRINQAIQIjMFREPDLMvYQIAEAVGIYDYRYFDRVFK 237 
+YL ++ KK L + DY+ + R+ QA +L+ DL +Y+IAE VG D YF + FK 



WO 02/34771 



PCT/GB01/04789 



-601- 

Sbjct: 171 PTYLSSLIKKELGLPFQDYLVRERVKQA-KLLLLTTDLKIYEIAEKVGFEDMNYFTQRFK 229 

Query: 238 KYLGQTVKAFKE 249 

+ G T + FK+ 
Sbjct: 230 QIAGVTPRQFKK 241 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1621> which encodes the amino acid 
sequence <SEQ ID 1622>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 423 9 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 193/257 (75%) , Positives = 226/257 (87%) 



Query: 


1 


MYRLLIVEDEHLI RKWLRYAI DYQSLNI LWGEAKDGKEGAQL I QEEQPD I VLSD INMPI 


60 






MY+L+I+EDEH+IRKWLRYAIDY++L+ILV+GEAKDGKEGA LI+E QPDI VL+DINMPI 




Sb j ct : 


1 


MYKLVI IEDEHI IRKWLRYAIDYKALDILVIGEAKDGKEGAVLIKESQPDIVLTDINMPI 


60 


Query: 


61 


MTAFDMFEATKGQSYAKIILSGYADFPNAQSAIHYGVLEFLTKPLEKQALIDCLKTIMAR 


120 






MTAFDMFE TK Q+YAKI ILSGYADFPNA+SAIHYG VLEFLTKP+EK AL +CL+TI+A+ 




Sbjct: 


61 


MTAFDMFEVTKDQTYAKIILSGYADFPNARSAIHYGVLEFLTKPIEKAALWECLQTIIAK 


120 


Query: 


121 


IEEHKEKHU3EHTELYLPLPQANDQVPEVIKDMLAWIHSHFHGKIVISQLAHDLGYSESY 


180 






IE+ K + + +Y+PLPQ DQ+PEV+KD+L W+H+HF KI S+LAHDLGYSESY 




Sb j ct : 


121 


IEKQKGSNQKTDACOTIPLPQMTDQIPEVVKDILEWVHAHFQDKISTSRLAHDLGYSESY 


180 


Query: 


181 


LYTVTKKHLHITLSDYINQYRINQAIQLMFREPDLMVYQIAEAVGIYDYRYFDRVFKKYL 


240 






+Y KKHL + LSDYINQYRINQAIQLM +EPDLMVY+ IA+AVGI YDYRYFDRVFKKYL 




Sbjct: 


181 


IYQNIKKHLQMPLSDYINQYRINQAIQLMQQEPDLMVYEIAQAVGIYDYRYFDRVFKKYL 


240 


Query: 


241 


GQTVKAFKEEHI FKQMD 257 








GQTVKAFKEEH K D 




Sb j ct : 


241 


GQTVKAFKEEHFMKDTD 257 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 507 

A DNA sequence (GBSx0545) was identified in S.agalactiae <SEQ ID 1623> which encodes the amino 
acid sequence <SEQ ID 1624>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2964 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 508 

A DNA sequence (GBSx0546) was identified in S.agalactiae <SEQ ID 1625> which encodes the amino 
acid sequence <SEQ ID 1626>. This protein is predicted to be two-component sensor histidine kinase. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 45 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-13.80 Transmembrane 266 - 282 ( 257 - 285) 
INTEGRAL Likelihood =-12.90 Transmembrane 29 - 45 ( 24 - 51) 

10 Final Results 

bacterial membrane Certainty=0 . 6519 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 10197> which encodes amino acid sequence <SEQ ID 
101 98> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05628 GB:AP001513 two-component sensor histidine kinase 
[Bacillus halodurans] 

20 Identities = 84/258 (32%) , Positives = 138/258 (52%) , Gaps = 23/258 (8%) 

Query: 298 SSAINQMVLDMDAI SRQEKSS IELDSQDEFQY LSVQINQMVSRLKDLHEKTLDLETQKLL 357 

S INQ+ S K+ I +D +DE LSVQ NQMV+ L+ L + + QK L 

Sbjct: 327 SERINQVA SGDLKTKIWDGKDEIGQLSVQFNQMVANLRSLIHQVHETNRQKRL 380 

25 

Query: 358 FEK --RMLEAQFNPHFLYNTLETILITSHYDSQL-TERIVIQLTKLLRYSLSGST 409 

EK +ML +Q NPHFL+NTLE+I + SH + ++V QL KL+R SL + 

Sbjct: 381 LEKSQNEIKLKMLASQINPHFLFimiESIRMKSHMKGETEIAKOTKQLGKLMRKSLEVTG 440 

30 Query: 410 EAAVLKDDLAI IES YLLINQVRF - EELTYTI SVSPELEHMRVPKLFLLPLI ENAI KYGLK 468 

L+++L ++ YLI R++LY++P+E++ L+ PL+ENA+ +GL+ 
Sbjct: 441 HHIPLRNELDMVRCYLEIQTFRYGDRLHYELYIDPQSEMVEILPLIIQPLVENAVIHGLE 500 

Query: 469 ERHD-VAINIDIWQDSDGIWFTVSNNGSGISLARQQAIRTMLRSTH SHHGLINSYR 523 

35 D + I + + + V+++G G+ + +AI+ ML + GL+N ++ 

Sbjct: 501 RTEDGGTVTISTIWGITOLWIVNDDGCGMDEEKLEAIQNMLHHPQEVDGNKIGLLNVHK 560 

Query: 524 RLQYQF STVLLEFTK 538 

RLQ + S +++E K 
40 Sbjct: 561 RLQLTYGKTSGLI IESAK 578 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1627> which encodes the amino acid 
sequence <SEQ ID 1628>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

45 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.88 Transmembrane 27 - 43 ( 22 - 49) 
INTEGRAL Likelihood = -9.08 Transmembrane 263 - 279 ( 258 - 282) 

50 Final Results 

bacterial membrane Certainty=0. 5352 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

>GP:BAB05628 GB:AP001513 two-component sensor histidine kinase 
[Bacillus halodurans] 
Identities = 85/270 (31%) , Positives = 139/270 (51%) , Gaps = 20/270 (7%) 
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Query: 276 IFVILQRKSSGLANRIAAKNSRAINQMVRDMSAISRQEKRRIDLESQDEFQYLSDQINQM 335 

+ V+L S L ++ + S INQ+ S K +1 ++ +DE LS Q NQM 

Sbjct: 307 VAVLLIVHFSWLISKRLSHLSERINQVA SGDLKTKIWDGKDEIGQLSVQFNQM 360 

5 Query: 336 VERLQQLHDKTLDLETQKLLFEK RMLEAQFNPHFLWTLETILITSHYDSAL- 387 

V L+ L + + QK L EK +ML +Q NPHFL+NTLE+I + SH 

Sbjct: 361 VANLRSLIHQVHETNRQKRLLEKSQNEIKLKMIASQINPHFLFNTLESIRMKSHMKGETE 420 

Query: 388 TEKI VI QLTKLLRYSLTDS SKPVLLKDDLSVIESYIiVINQVRF - EELQYS INLS PDLDSL 446 
10 K+V QL KL+R SL + + L+++L ++ YLI R++LY++P ++ 

Sbjct: 421 IAKWKQLGKLMRKSLEVTGHHIPLRNELDMVRCYLEIQTFRYGDRLHYELYIDPQSEMV 480 

Query: 447 EVPKLFLLPLIENAIKYGLKERHD-VKINIACYYQDDHIIFSVRDNGSGIDAHHQKVIRE 505 
E+ L + PL+ENA+ +GL+ D +1+ + + V D+G G+D + 1 + 

15 Sbjct: 481 EILPLIIQPLVENAVIHGLERTEDGGTVTISTIVNGNDLTVIVNDDGCGMDEEKLEAIQN 540 

Query: 506 QL EAGESHHGLINSYRRLKYHFSEVS 531 

L E + GL+N ++RL+ + + S 
Sbjct: 541 MLHHPQEVDGNKIGLLNVHKRLQLTYGKTS 570 

20 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 369/549 (67%) , Positives = 449/549 (81%) 

Query: 3 MRGYRMEERFKKRLQDDISKHFSRQSLILSLLLIALFVLFSLAPQQIGLYKDVNSVSYSY 62 
25 MRG ++EE FKK+LQDDIS+HFS QSL+LSLLLI LF++FSLAPQQ+GLY+D+N+ + Y 

Sbjct: 1 MRGEQVEEHFKKQLQDDISRHFSYQSLMLSLLLIGLFIIFSLAPQQLGLYRDINATATRY 60 

Query: 63 KQLIQKHDTLLDDLGKNSLKPFVSGHLGSADLSKQYYHLRNHLQSQTELLVFSPNQELLF 122 
+LI K + LLDDLGKNSL PF++ +L +ADLSK Y+HLR+ Q+ ELL+FSP+Q+LLF 
30 Sbjct: 61 HRLISKQEALLDDLGKNSLLPFLNKNLSTADLSKHYFHLRHSSQTSPELLLFSPSQDLLF 120 

Query: 123 ASNSHLGNFFSKSIYISEVlDKAKINQRIiLKIIVDSEGGHYLRLIKPIIVNKKVSGYAFL 182 

ASN HLGN FSKS+YI EVL + L K +DSE GHYL +1 P+I ++ GYAFL 

Sbjct: 121 ASNPHLGNVFSKSvYIQEVLRATHSPKTLFKDAMDSEDGHYLMIlMPMIDQNQLKGYAFL 180 

35 

Query: 183 LMNGKDFLLPTKAINSDLIIADQLNNSFTFTNRDFISSSLDKVDSQFLTRYFSFHDHRAF 242 

+M+GKDFL PTK + S+L+IAD+L+N+FTF+NR+FI+SSLDK++SQ+L YF F D+RAF 
Sbjct: 181 VMSGKDFLHPTKTLTSEIjVIADKLDNTFTFSNREFIASSLDKINSQYLHHYFVFQDNRAF 240 

40 Query: 243 VTOKVALQDNILLYMYRPLIPVTLVvlFSLVSSVIIFVILRQKSRVLADRIAVKNSSAIN 302 

+ RKVALQ + LYMYRPLIP+ V+LFSL+SS +IFVIL++KS LA+RIA KNS AIN 
Sbjct: 241 ITRKVALQGGLWLYMYRPLIPMVSVMLFSLISSAVIFVILQRKSSGLANRIAAKNSRAIN 300 

Query: 303 QMVLDMDAISRQEKSSIELDSQDEFQYLSVQINQMVSRLKDLHEKTLDLETQKLLFEKRM 362 
45 QMV DM AISRQEK I+L+SQDEFQYLS QINQMV RL+ LH+KTLDLETQKLLFEKRM 

Sbjct: 301 QMVRDMSAISRQEKRRIDLESQDEFQYLSDQINQMVERLQQLHDKTLDLETQKLLFEKRM 360 

Query: 363 LEAQFNPHFLYNTLETILITSHYDSQLTERIVIQLTKLLRYSLSGSTEAAVLKDDLAIIE 422 
LEAQFNPHFLYNTLETILITSHYDS LTE+IVIQLTKLLRYSL+ S++ +LKDDL++IE 
50 Sbjct: 361 LEAQFNPHFLYNTLETILITSHYDSALTEKIVIQLTKLLRYSLTDSSKPVLLKDDLSVIE 420 

Query: 423 SYLLINQWFEELTYTISVSPELEHMRVPKLFLLPLIENAIKyGLKERHDVAINIDIWQD 482 

SYL+ INQVRFEEL Y+I++SP+L+ + VPKLFLLPLIENAIKYGLKERHDV INI + 
Sbjct: 421 SYLVINQVRFEELQYSINLSPDLDSLEVPKLFLLPLIENAIKYGLKERHDVKINIACYYQ 480 

55 

Query: 483 SDGIWFWSNNGSGISLARQQAIRTMLRSTHSHHGLINSYRRLQYQFSTVLLEFTKTDDA 542 

D I F+V +NGSGI Q+ IR L + SHHGLINSYRRL+Y FS V L F + D 
Sbjct: 481 DDHIIFSVRDNGSGIDAHHQKVIREQLEAGESHHGLINSYRRLKYHFSEVSLVFDQGDKQ 540 

60 Query: 543 FRVSYIVKE 551 

F VSY VKE 
Sbjct: 541 FNVSYHVKE 549 



A related GBS gene <SEQ ID 8587> and protein <SEQ ID 8588> were also identified. Analysis of this 
65 protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 6.23 

GvH: Signal Score (-7.5): -0.0500002 

Possible site: 38 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -13.80 threshold: 0.0 

INTEGRAL Likelihood =-13.80 Transmembrane 259 - 275 ( 250 - 278) 
PERIPHERAL Likelihood = 2.70 404 
modified ALOM score: 3.26 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 6519 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

33.2/53.9% over 181aa 
20 Streptococcus pneumoniae 

GP| 5830535 | histidine kinase Insert characterized 

ORF00032 (1309 - 1848 of 2253) 

GP|5830535|emb[CAB54576.l[ |AJ006396(1 - 182 of 231) histidine kinase {Streptococcus 
25 pneumoniae} 
%Match =5.9 

%Identity =33.2 %Similarity =53.8 

Matches = 61 Mismatches = 78 Conservative Sub.s =38 

30 1053 1083 1113 1143 1173 1203 1233 1263 

FVTOKVALQDNILLYMYRPLIPOTLWLFSLVSSVIIFVILRQKSRVIiADRI^ 

1293 1323 1350 1380 1410 1440 1494 

DSQDEFQYLSVQINQMVSRL-KDLHEKTLDLETQKLLFEKRMLEAQFNPHFLYNTLETILITSHYDSQ--LTERIVIQLT 

35 h| |: = |: II = I hi I lllhlllll = = = || | .: |= ::: • 

MLDRLEKNIHD- IYQLELSQKXIANMRALQAQINPHFMYNTLEFLRMYAVMQSQDELAD- IIYEFS 
10 20 30 40 50 60 

1524 1554 1584 1611 1641 1671 1701 1728 

40 KLLRYSLSGSTEAAVLKDDLAIIESYLLINQVRF-EELTYTISVSPELEHMRVPKLFLLPLIENAIKYGLKERH-DVAIN 

III ::| =11 =1 | : ||= : : | : Mlhl-lh I Ihll =h I I h 

SLLRNNIS-DERETLLKQELEFCRKYSYLCMVRYPKSIAYGFKIDPELENMKIPKFTLQPLVENYFAHGVDHRRTDNVIS 
80 90 100 110 120 130 140 

45 1758 1788 1818 1848 1878 1908 1938 1968 

idiwqdsdgiwftvs™gsgisiarqqairtmlrsthshhglinsyrrlqyqfstvlleftktddafrvsyivke*vmyr 

I : = I :|| hi » II I = I I =1 II I 

ikalkqdgfveilvvdngrgmsaeklanireklsqryfehqasysdqrqsigivnvherfvlyfgdryaitiesaeqagv 

160 170 180 190 200 210 220 

50 SEQ ID 8588 (GBS47) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 2; MW 84kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 4; MW 59.3kDa). 

GBS47-His was purified as shown in Figure 221, lane 4-5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 
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Example 509 

A DNA sequence (GBSx0547) was identified in S.agalactiae <SEQ ID 1629> which encodes the ammo 
acid sequence <SEQ ID 1630>. This protein is predicted to be phosphotransferase enzyme II, D 
component. Analysis of this protein sequence reveals the following: 

5 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.46 Transmembrane 258 - 274 ( 252 - 274) 

INTEGRAL Likelihood = -9.13 Transmembrane 232 - 248 ( 227 - 251) 

INTEGRAL Likelihood = -5.31 Transmembrane 142 - 158 ( 140 - 161) 

10 INTEGRAL Likelihood = -2.50 Transmembrane 119 - 135 ( 118 - 139) 

Final Results 

bacterial membrane Certainty=0. 5182 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with die following sequences in the GENPEPT database: 

>GP:AAC74889 GB:AE00O276 PTS enzyme IID, mannose-specif ic 
[Escherichia coli K12] 
20 Identities = 94/280 (33%) , Positives = 156/280 (55%) , Gaps = 13/280 (4%) 

Query: 3 SQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVIYTLLPVINRFYKTDKD-KAEA 61 

++ LT+ D +R VF RS S + A+G ++++P I R Y + + + +A 

Sbjct: 12 TEKKLTQSD IRGVFLRSNLFQGS - WNFERMQALGFCFSMVPAIRRLYPENNEARKQA 67 

25 

Query: 62 LVRHTTWFNATMHINNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFFW 121 

+ RH +FN + I+G+ ++E++ + + D AI +K LMGP++GVGD FW 
Sbjct: 68 IRRHLEFFNTQPFVAAPILGVTLALEEQRANGAEIDIX3AINGIKVGLMGPLAGVGDPIFW 127 

30 Query: 122 GILRVIAAGIGISLASTGSAMGAVVFLLLYNIPAFLIHYYSLYGGYSVGAGFIKKLYESG 181 

G +R + A +G +A +GS +G ++F +L+N+ YY + GYS G +K + G 

Sbjct: 128 GTVRPVFAALGAGIAMSGSLLGPLLFFILFNLVRLATRYYGVAYGYSKGIDIVKDM-GGG 186 

Query: 182 GIKIVTKTSSMLGLMMVGSM TASNVKFKTILTVAAKGAKEAASIQSYLDQLFVGVV 237 

35 ++ +T+ +S+LGL ++G++ T N+ G + ++Q+ LDQL G+V 

Sbjct: 187 FLQKLTEGASILGLFVMGALVWKWTHVNIPLVVSRITDQTGKEHVTTVQTILDQLMPGLV 246 

Query: 238 PLLVTILAFWLLRKKVNINWIMFGIMVLGI - - - VLGLLGI 274 
PLL+T WLLRKKVN WI+ G V+GI GLLG+ 
40 Sbjct: 247 PLLLTFACMWLLRKKVNPLWIIVGFFVIGIAGYACGLLGL 286 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1631> which encodes the amino acid 
sequence <SEQ ID 1632>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
45 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.98 Transmembrane 255 - 271 ( 251 - 274) 

INTEGRAL Likelihood = -7.01 Transmembrane 232 - 248 ( 228 - 250) 

INTEGRAL Likelihood = -5.68 Transmembrane 142 - 158 ( 140 - 161) 

INTEGRAL Likelihood = -2.50 Transmembrane 119 - 135 ( 118 - 139) 

50 



55 



60 



Final Results 

bacterial membrane Certainty=0. 4991 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC74889 GB:AE000276 PTS enzyme IID, mannose-specif ic 
[Escherichia coli] 

Identities = 94/281 (33%) , Positives = 157/281 (55%) , Gaps = 13/281 (4%) 
Query: 2 TSQDNLTKEDRKMLRS VFWRSWTMNASRTGATQYHAVGVI YTLLPVINRFYKTDKD - KAE 60 
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T++ LT+ D +E W RS S + A+G ++++P I R Y + + + + 

Sbjct: 11 TTEKKLTQSD IRGVFLRSNLFQGS-WNFERMQALGFCFSMVPAIRRLYPENNEARKQ 66 

Query: 61 ALWHTTWFNATMHINNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFF 120 

A+ RH +FW + I+G+ ++E++ + + D AI +K LMGP++GVGD F 
Sbjct: 67 AIRRHLEFFNTQPFVAAPILGVTLALEEQRANGAEIDDGAINGIKVGLMGPLAGVGDPIF 126 

Query: 121 WGI LRVIAAGIG I SIASAGSAMGA VVFLLLYNI PAFI IHYYSLYGGYSVGAGF I KKLYES 180 

WG +R + A +G +A +GS +G ++F +L+N+ YY + GYS G +K + 

Sbjct: 127 WGTWPVFAALGAGIAMSGSLLGPLLFFILFNLWLATRYYGVAYGYSKGIDIVKDM-GG 185 

Query: 181 GGIKIVTKTSSMLGLMMVGSM TASNVKFKTILTVAAKGAKEAAS IQDYLDQLFIGI 236 

G ++ +T+ +S+LGL ++G++ T N+ G + ++Q LDQL G+ 

Sbjct: 186 GFLQKLTEGASILGLFVMGALVNKWTHVNIPLWSRITDQTGKEHVTTVQTILDQLMPGL 245 

Query: 237 VPLMVTLAAFWLLRKKVNI IWIMFGIMFLGI - - - ILGLLGI 274 

VPL++T A WLLRKKVN +WI+ G +GI GLLG+ 
Sbjct: 246 VPLLLTFACMWLLRKKVNPLWI IVGFFVIGIAGYACGLLGIi 286 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 263/275 (95%), Positives = 269/275 (97%) 

Query: 1 MKSQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVIYTLLPVINRFYKTDKDKAE 60 

M SQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVIYTLLPVINRFYKTDKDKAE 
Sbjct: 1 MTSQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVIYTLLPVINRFYKTDKDKAE 60 

Query: 61 ALVRHTTWFNATMHINNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFF 120 

ALVRHTTWFNATMHINNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFF 
Sbjct: 61 ALVRHTTWFNATMHINNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFF 120 

Query: 121 WGILRVIAAGIGISLASTGSAMGAVVFLLLYNIPAFLIHYYSLYGGYSVGAGFIKKLYES 180 

WGILRVIAAGIGISLAS GSAMGAVVFIjLLYNIPAF+IHYYSLYGGYSVGAGFIKKLYES 
Sbjct: 121 WGILRVIAAGIGISLASAGSAMGAVVFLIiLYNIPAFIIHYYSLYGGYSVGAGFIKKLYES 180 

Query: 181 GGIKIVTKTSSMLGLMMVGSMTASNVKFKTILTVAAKGAKEAASIQSYLDQLFVGWPLL 240 

GGI KI VTKTSSMLGLMMVGSMTASNVKFKTI LTVAAKGAKEAASIQ YLDQLF+G+VPL+ 
Sbjct: 181 GGIKIVTKTSSMLGL>MVGSMTASNVKFKTILTVAAKGAKEAASIQDYLDQLFIGIVPLM 240 

Query: 241 VTIIAFWLLRKKVNINWIMFGIMVLGIVLGLLGIC 275 

VT+ AFWLLRKKVNI WIMFGIM LGI+LGLLGIC 
Sbjct: 241 VTLAAFWLLRKKVNIIWIMFGIMFLGIILGLLGIC 275 

There is also homology to SEQ ID 5236. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9077> which encodes the amino 
acid sequence <SEQ ID 9078>. An alignment of the GAS and GBS sequences follows: 

Score = 178 bits (448) , Expect = 3e-47 

Identities = 83/136 (61%) , Positives = 108/136 (79%) 

Query: 2 IMEEITIYHNPNCGTSRNVIAMIRHAGIEPTIIEYLQTPPNRETLIELLQSMGISARELL 61 

+ME+I IYHNPNCGTSRNVLA+IRH GIEP II YL+TPP+R L+ELL M +SARELL 
Sbjct: 1 MMEKIRIYHNPNCGTSRNVLAIIRHCGIEPEIIYYLKTPPSRMELVELLLEMKLSARELL 60 

Query: 62 RTWPEFFAYGIANQAVAEKDIINAMLADPILINRPIVOTRKGVKLCRPSETLLDILPVP 121 

RT+VP +E + L + +V ++++I+AM+ DPILINRPIWT KG KLCRP E +L ILPV 
Sbjct: 61 RTDVPAYEKFNLESSSVTDEEMIDAMIQDPILINRPI'VVTSKGAKLCRPCEAILTILPVK 120 

Query: 122 LPSPYIKEDGESVNPI 137 

+ ++KEDG+ + + 
Sbjct: 121 MEKDFVKEDGQI IQSL 136 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 510 

A DNA sequence (GBSx0548) was identified in S.agalactiae <SEQ ID 1633> which encodes the amino 
acid sequence <SEQ ID 1634>. This protein is predicted to be PTS permease for mannose subunit IIPMan. 
Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 .4482 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC44680 GB:U65015 PTS permease for mannose subunit IIPMan 
[Vibrio furnissii] 

Identities = 70/251 (27%) , Positives = 132/251 (51%) , Gaps = 6/251 (2%) 



Query: 


2 


IMPATMAALAVLICFGGNYLTGQSMMERPLWGLVTGMLLGDIKVGILMGASLEALFLGN 6 1 






+ AML +G+ G+ RP+V+G + G++LGD+ GIL+G +LE +++G 


Sbjct: 


5 


LFQMM^LLAFIA-GLDLFNGLTHFHRF 63 


Query: 


62 


WIGGVIAAEPVTATAMATTFTIISNITOKftAMTIiAVPIGMIiAAFvVMFLKNVFMNI FAP 121 






+ G + T + TTF I +N++ A+ +AVP + + L + + + 


Sbjct: 


64 


APLAGAQPPWIIGTIVGTTFAITTNTOPNVATOVAVPFAVAVQMGITLLFSAMSAVMSK 123 


Query: 


122 


^TTOKAAAANHQGKLVMLHYGTWII- - YYLIIASISFIGILVGSGPVNSFVHHIPQNLMNG 179 






+ A A+ +G + ++ ++ +Y + A F+ I +G+ + V +P+ L++G 


Sbjct: 


124 


CDEYAKNADTRGI ERVNYFALAVLGS FYFLCA FLPIYLGADHAGAMVAALPKALIDG 180 


Query: 


180 


LSAAGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVljTAYLKLPAVAVAALGAVICVISSQR 239 






L AGG++PA+GFA+LMK++ N +++LGFV A+L+LP +A+ + +1 R 


Sb j ct : 


181 


LGVAGGIMPAIGFAVLMKimKNAYIPYFILGFVAAAWLQLPILAIRCAATAMAIIDFMR 240 


Query: 


240 


DIELDAITRGA 250 


Sb j ct : 


241 


E + A 
KSEPTPVNASA 251 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1635> which encodes the amino acid 
sequence <SEQ ID 1636>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
>» Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane -- 
bacterial outside — 
bacterial cytoplasm -- 



Certainty=0. 4482 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAC44680 GB-.U65015 PTS permease for marmose subunit IIPMan 
[Vibrio furnissii] 

Identities = 72/251 (28%) , Positives = 132/251 (51%) , Gaps - 6/251 (2%) 

5 

Query: 2 LVPATMAALAVLICFGGNYLTGQS^ERPLWGLVTGLLLGDMKVGILMGASLEALFLGN 61 

LAM L + G+ G+ RP+V+G + GL+LGD+ GIL+G +LE +++G 
Sbjct: 5 LFQALMLGLLAFLA-GLDLFNGLTHFHRPVVLGPLVGLILGDLHTGILVGGTLELIWMGL 63 

10 Query: 62 VNIGGVIAAEPVTATAMATTFTIISHIDQKAAMTIAVPIGMIJAFVvMFLKWFMNIFAP 121 

+ G + T + TTF I ++++ A+ +AVP + + L + + + 

Sbjct: 64 APIAGAQPPWIIGTIVGTTFAITTNTOPNVAVGVAVPFAVAVQMGITLLFSAMSAVMSK 123 

Query: 122 MVDKAAAANHQGKLVMLHYGTWII--YYLIIASISFIGILVGSGPVNAFVEHIPQNLMNG 179 
15 + A A+ +G + ++ ++ +Y + A F+ I +G+ A V +P+ L++G 

Sbjct: 124 CDEYAKNADTRGIERVNYFALAVLGSFYFLCA FLPIYLGADHAGAMVAALPKALIDG 180 

Query: 180 LSAAGGLLPAVGFAMLMKLLVmJKLAVFYLLGFVLTAYLKLPAVAVAALGAVICVISSQR 239 
L AGG++ PA+GFA+LMK+ + N - +++LGFV A+L+LP +A+ + +1 R 

20 Sbjct: 181 LGVAGGIMPAIGFAVIiMKI^KNAYIPYFILGFVAAAWLQLPILAIRCAATAMAIIDFMR 240 

Query: 240 DLELDAITRGA 250 

E + A 
Sbjct: 241 KSEPTPVNASA 251 

25 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 261/269 (97%) , Positives = 268/269 (99%) 

Query: 1 MIMPAT^WiIAVLICFGGNYLTGQSMMERPLvVGLvTG^^I/3DIKVGILMGASLEALFIlG 60 
30 M++PATMAALAVLICFGGNYLTGQSMMERPLWGLVTG+LLGD+KVGILMGASLEALFLG 

Sbjct: 1 MLVPATMAALAVLICFGGNYLTGQSMMERPLWGLVTGLIiIjGDMKVGILMGASLEALFLG 60 

Query: 61 NVNIGGVIAAEPVTATAMATTFT 1 1 SNIDQKAAMTLAVPIGMLAAFWMFLKNVFMNI FA 120 
NVNIGGVIAAEPVTATAMATTFTIIS+IDQKAAMTLAVPIGMLAAFVVMFLKNVFMNIFA 
35 Sbjct: 61 NVNIGGVIAAEPVTATAI^TTFTIISHIDQKAAOTIAVPIGMLAAFV 120 

Query: 121 PDMKAAAAiraQGKLVMLHYGTWIIYYLIIASISFIGILVGSGPWSFvHHIPQNLMNGL 180 

PMVDKAAAANHQGKLVMLHYGTWIIYYLIIASISFIGILVGSGPVN+FV HIPQNLMNGL 
Sbjct: 121 PMVDKAAAANHQGKL VMLHYGTWI I YYLI IASISFIGILVGSGPVNAFVEHI PQNLMNGL 180 

40 

Query: 181 SARGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVLTAYLKLPAVAVAALGAVICVISSQRD 240 

SAAGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVLTAYLKLPAVAVAALGAVI CVI SSQRD 
Sbjct: 181 SAAGGLLPAVGFAMLMKLLVraKLAVFYLLGFVLTAYLKLPAVAVAALGAVICVISSQRD 240 

45 Query: 241 IELDAITRGAISKQTTFDSKESEEEDFFA 269 

+ELDAITRGAISKQTTFDSKESEEEDFFA 
Sbjct: 241 LELDAITRGAISKQTTFDSKESEEEDFFA 269 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 511 

A DNA sequence (GBSx0549) was identified in S.agalactiae <SEQ ID 1637> which encodes the amino 
acid sequence <SEQ ID 1638>. This protein is predicted to be pts system, sorbose-specific iib component. 
Analysis of this protein sequence reveals the following: 

55 Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1874 (Affirmative) < suco 

60 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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i 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA46858 GB:X66059 EIII-B Sor PTS [Klebsiella pneumoniae] 
5 Identities = 49/158 (31%) , Positives = 94/158 (59%) , Gaps = 8/158 (5%) 

Query: 2 ITQIRVDDRLIHGQVAvWTKEmAPLLVVANDEAAKNEITQMTLKMAVPNGMKLLIRSV 61 

IT R+DDRLIHGQV VW+K NA +++ ND+ +E+ + L+ A P GMK+ + S+ 
Sbjct: 3 ITLARIDDRLIHGQ VTTVWSKVANAQRI 1 1 CNDDVFNDE VRRTLLRQAAPPGMKVNVVSL 62 

10 

Query: 62 EESIALFKDPRATDKRIWITOSVKDACTIAKNITDLEAVWANVGRFDKSDPATKVKLT 121 

E+++A++ +P+ D+ +F + + D T+ + + +N+ + + K +LT 

Sbjct: 63 EKAVAVYHNPQYQDETVFYLFTNPHDVLTMVRQGVQIATLNIGGM AWRPGKKQLT 117 

15 Query: 122 SSLLLNTEELEAAKELASL- PDLDVFNQVLPSNTKVNL 158 

++ L+ ++++A +EL L LD+ +V+ S+ VN+ 
Sbjct: 118 KAVSLDPQDIQAFRELDKLGVKLDL- -RWASDPSVNI 153 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1639> which encodes the amino acid 
20 sequence <SEQ ID 1640>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 1874 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

( 

An alignment of the GAS and GBS proteins is shown below: 

30 Identities = 145/162 (89%) , Positives = 152/162 (93%) 

Query: 1 MITQIRVnDRLIHGQVAWVWKEINAPLLVVANDEAAKNEITQMTLKI^VPNGMKLLIRS 60 

MITQIRVDDRLIHGQVAVVm'KELNAPLLWANDFAAKNEITQMTLKMAVPNGMKLLIRS 
Sbjct: 1 MITQIRVDDRLIHGQVAVVWTKELNAPLLWANDEAAKl^ITQMTLKMAVPNGMKLLIRS 60 

35 

Query: 61 VEESIALFKDPRATDKRIFVIVNSVKDACTIAKNITDLEAVNVANVGRFDKSDPATKVKL 120 

VE+SI LF DPRA DKRIFVIVNSVKDAC IAK + DLEAVNVANVGRFDKSDPA+KVK+ 
Sbjct: 61 VEDS I KLFNDPRAKDKRI WI VNSVKDACAIAKEVPDLEAVNVANVGRFDKSDPASKVKV 120 

40 Query: 121 TSSLLLNTEELEAAKELASLPDLDVFNQVLPSNTKVNLSQLV 162 

T SLLLN EE+ AAKEL SLP+LDVFNQVLPSNTKV+LSQLV 
Sbjct: 121 TPSLLLNPEEMAAAKELVSLPELDVFNQVLPSNTKVHLSQLV 162 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 512 

A DNA sequence (GBSx0550) was identified in S.agalactiae <SEQ ID 1641> which encodes the amino 
acid sequence <SEQ ID 1642>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
50 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.22 Transmembrane 87 - 103 ( 87 - 104) 

Final Results 

bacterial membrane Certainty=0. 1489 (Affirmative) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1643> which encodes the amino acid 
sequence <SEQ ID 1644>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

5 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 87 - 103 ( 87 - 104) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 115/141 (81%) , Positives = 125/141 (88%) 

Query: 1 MKRKFLIGSHGKLASGLQSSIDILTGKGQEIQTIDAYIDDSDYTKSIVEFIDEIAPDEQG 60 

MKRKFLIGSHG+LASGLQSSIDIL G GQ ++TIDAY+DDSDYT I +FI +A DEQG 
Sbjct: 1 MKRKFLIGSHGRLASGLQSSIDILAGMGQALETIDAYVDDSDYTSQIDDFIAGVAADEQG 60 

20 

Query: 61 LIFTDLLGGSVNQKMATAVMNSGKNNIFLITNSNLATLLSLLFLKPEEELTKEEIVTVIN 120 

LIFTDLLGGSVNQKM TAVMNSGK+NIFLITNSNLATLLSL+FLKP E LTK+EIVTVIN 
Sbjct: 61 L I FTDLLGGS VNQKMVTAVMNSGKDNI FL I TNSNLATLLSLVFLKPGEALTKDE I VTVIN 120 

25 Query: 121 ESQVQLVDLSFKAGSEDDFFD 141 

ESQVQLVDL + SEDDFFD 
Sbjct: 121 ESQVQLVDLVPETNSEDDFFD 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 513 

A DNA sequence (GBSx0551) was identified in S.agalactiae <SEQ ID 1645> which encodes the amino 

acid sequence <SEQ ID 1646>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2469 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 514 

A DNA sequence (GBSx0552) was identified in S.agalactiae <SEQ ID 1647> which encodes the amino 
acid sequence <SEQ ID 1648>. This protein is predicted to be racemase. Analysis of this protein sequence 
reveals the following: 

50 Possible site: 41 

>» Seems to have no N-terminal signal sequence 



WO 02/34771 



PCT/GB01/04789 



-611- 



INTEGRAL 


Likelihood 


= 


-8. 


.65 


Transmembrane 


319 - 


• 335 


( 


316 - 


339) 


INTEGRAL 


Likelihood 


= 


-6. 
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Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 4461 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF71283 GB:AF253562 racemase [Enterococcus faecalis] 
Identities = 78/262 (29%) , Positives = 129/262 (48%) , Gaps = 29/262 (11%) 



Query: 


13 


KQHNTSMISLLQYLFSILVILVHSGRLFS-QDVIHFTFKBFLGRMAVPYFLICTAFFLRG 


71 






K + S I +++ ++L++ +H+ LFS + +F F + +AVP+F + + FFL 




Sb j ct : 


3 


KNESYSGIDYFRFIAALLIVAIHTSPLFSFSETGNFIFTRIVAPVAVPFFFMTSGFFL-- 


60 


Query: 


72 


RIQQGLCNHSYFRKLIKK YSMWTI IYLPY GYFFFESLNIAKIYLLPGFIVAF 


123 






I + CN IKK Y + ++Y+P GYF ++L LP I 




Sb j ct : 


61 


-ISRYTCNAEKLGAFIKKTTLIYGVAILLYIPINVYNGYFKMDNL LPNIIKDI 


112 


Query: 


124 


LYLGMSHTLWYIPAVILGWVIIQGLLKYVGTRGTFITWvLYCIGAV-ETYSVFIQSTKF 


182 






++ G + LWY+PA I+G I L+K V R F+ +LY IG ++Y ++S 




Sbjct: 


113 


VFDGTLYHLWYLPASIIGAAIAWYLVKKVHYRKAFLIASILYIIGLFGDSYYGIVKSVSC 


172 


Query: 


183 


YPLMSTYMS I FQT TRNGLFYTPVYLLAGYLLYDYFNTDLFTKSRGLK-YILFLLLLA 


238 






L Y IFQ TRNG+F+ P++ + G + D + + + K ++YLFL+ 




Sb j ct : 


173 


- -LNVFYNLIFQLTDYTRNGIFFAPIFFVLGGYISD- -SPNRYRKKNYIRIYSLFCLMFG 


228 


Query: 


239 


LENVLIYFN-QGLDKNFFLLAP 259 








L +F+ Q D + LL P 




Sb j ct : 


229 


KTLTLQHFDIQKHDSMYVLLLP 250 





No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8589> and protein <SEQ ID 8590> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 0.23 
GvH: Signal Score (-7.5): -5.77 

Possible site: 34 
»> Seems to have an uncleavable N-term signal seq 



ALOM program 
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.68 threshold: 0 
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97 


- 113) 


PERIPHERAL 


Likelihood = 5 . 


.78 


10 













modified ALOM score: 1.64 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .3272 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



A related GBS gene <SEQ ID 859 1> and protein <SEQ ID 8592> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 11.50 
GvH: Signal Score (-7.5): -2.69 

Possible site: 32 
>>> Seems to have an uncleavable N-term signal seq 



ALOM program 


count: 9 value: 


-8. 


,65 threshold: 


0.0 
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101 
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Likelihood 
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152 
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168) 


INTEGRAL 


Likelihood 
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144) 


INTEGRAL 


Likelihood 
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Transmembrane 
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( 
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293) 


INTEGRAL 


Likelihood 




-0 
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Transmembrane 


44 


- 60 


( 


43 




60) 


PERIPHERAL 


Likelihood 




5. 


.78 


190 















modified ALOM score: 2.23 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00153 (307 - 1140 of 1632) 

GP|7960293|gb|AAF71283.l|AF253562_7|AF253562(2 - 284 of 711) racemase {Enterococcus 
f aecalis} 
%Match =8.5 

%Identity =32.7 %Similarity =54.0 

Matches = 91 Mismatches = 113 Conservative Sub.s = 59 

150 180 210 240 270 300 330 360 

CEISFFIS*YG**GINJ^QIPFKflFQ*LFGIIEIFF*RDWYHSNDm*KVMLRMKRSQCVDNKQHOTSMIS 

| : | 1 :::::: 
MTKNESYSGIDYFRFIAAL 



390 417 447 477 507 537 555 

LVILVHSGRLFS-QDVIHFTFKSFLGRMAVPYFLICTAFFLRGRIQQGLCNHSYFRKLIKK YSMWTI IYLP Y- 

h: :|: III = = I I = HIM-" = III I = II = =111 I = :*|:| I 
LIVAIHTSPLFSFSETGNFIFTRIVAPVAVPFFFMTSGFFL- - - ISRYTCNAEKLGAFIKKTTLIYGVAILLYIPINVYN 
30 40 50 60 70 80 90 

603 633 663 693 723 753 783 810 

GYFFFESI^IAKIYLLPGFIVAFLYLGMSHTLWYIPAVILGWVIIQGLLKYVGTRGTFITVVVLYCIGAV-ETYSVFIQS 

III ::| II I = = I = 111 = 11 hi I Ml I 1= =11 II = = l = = l 

GYFKMDNL LPNIIKDIVFDGTLYHLWYLPASIIGAAIAWYLVKKVHYRKAFLIASILYIIGLFGDSYYGIVKS 

110 120 130 140 150 160 

840 891 921 951 978 1008 1035 

TKFYPLMSTYMSIFQTT RNGLFYTPVYLLAGYLLYDYFNTDLFTKSRGLK-YILFLLLLALENVLIYFN-QGLDKNF 

I I III I 111=1= 1==== 1=11 =1 == I II l== 1=1=1 I = 

--VSCLWFYNLIFQLTDYTRNGIFFAPIFFVLGGYISDSPNR--YRKKNYIRIYSLFCLMFGKTLTLQHFDIQKHDSMY 
180 190 200 210 220 230 240 

1053 1080 1110 1140 1170 1200 1230 1260 

FLLAP LCAVFL - FNWS IRTSLFKEYRLS PLKQLS VYYFFLPPLFIGI VSYCLKSTSLVAHHQGKVI FWTLALTHA 

III I ==l I II I = I I = III 
VLLLPSWCLFI^LLHFRGKRRTGL-RTISLDQLYHSSVYDCCNTIVCIAELLHL 

260 270 280 290 300 310 320 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 515 

A DNA sequence (GBSx0553) was identified in S.agalactiae <SEQ ID 1649> which encodes the amino 
acid sequence <SEQ ID 1650>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3088 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

i 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 516 

A DNA sequence (GBSx0554) was identified in S.agalactiae <SEQ ID 1651> which encodes the amino 

acid sequence <SEQ ID 1652>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1446 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 517 

A DNA sequence (GBSx0555) was identified in S.agalactiae <SEQ ID 1653> which encodes the amino 
acid sequence <SEQ ID 1654>. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
35 McG: Discrlm Score: 8.28 

GvH: Signal Score (-7.5): -2.11 

Possible site: 20 
>>> Seems to have a cleavable N-term signal seq. 
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modified ALOM score: 2.17 
*** Reasoning Step: 3 



Final Results 
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bacterial membrane Certainty=0 .4333 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 518 

10 A DNA sequence (GBSx0556) was identified in S.agalactiae <SEQ ID 1655> which encodes the amino 
acid sequence <SEQ ID 1656>. This protein is predicted to be ABC transporter (ATP-bindingprot). 
Analysis of this protein sequence reveals the following: 



15 



20 



Possible site: 59 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1510 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10199> which encodes amino acid sequence <SEQ ID 
10200> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB88481 GB:AL353816 putative ABC transport system ATP-binding 
25 protein [Streptomyces coelicolor A3 (2) ] 

Identities = 104/284 (36%) , Positives = 159/284 (55%) , Gaps = 18/284 (6%) 

Query: 6 TMLLQLDNITKSYGKKIVLNQISYQFTPGLYGLLGANGTGKTTLLNLMSHFTLADSGNIY 65 
T + ++ YG+ L+ +S + TPG+ GLLG NG GKTTLL +++ AD G 
30 Sbjct: 2 TPTVSASGLSLHYGRTRALDDVSLRLTPGVTGLLGPNGAGKTTLLRVLATAVPADRGAFT 61 

Query: 66 WNGQEQS EEFYRHIGFLPQHFRYYDQFTGIAFLNYIATLKGV-DKKKAKQEIPRL 119 

G + +E R +G+LPQ ++ FT F++Y+A LK + D+++ +E+ R+ 

Sbjct: 62 VLGHDPGSSRGRQEVRRRLGYLPQTPGFHPDFTAFEFVDYVA1LKELADRRERHREVRRV 121 

35 

Query: 120 LELVGLGDVGKKKISSYSGGMKQRLGIAQALINDPElIilLDEPTVGLDPKERVKFRHILS 179 

LE V LG+V ++I SGGM+QR+ +A AL+ DP L+LDEPTVGLDP++R++FR +++ 
Sbjct: 122 LEEVDLGETOGRRIKKLSGGmQRVALAAALVGDPGFLVLDEPTVGLDPEQRMRFRELIA 181 

40 Query: 180 QLSTNKI I ILSTHIVSDVEAVAKEI IVLKNGKFIEHGNTAQLLKTIEGKVWEIT -TEPGL 238 

+ ++LSTH DV + +IV+ G G A+L G+VW T +PG 

Sbjct: 182 GAGEGRTVLLSTHQTEDVAMLCHRVIVMAAGAWFDGTPAELTARAAGRVWSSTEKDPG- 240 

Query: 239 SQIPNIAIVNEKVFSDSRVFRWSDICPSDSAQLWPTLEDFYI 282 
45 A + + S FR V D P A+ PTLED Y+ 

Sbjct: 241 AKAGWRTGTGS - - FRNVGD - - PPPGAEPAEPTLEDGYL 274 

There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 
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Example 519 

A DNA sequence (GBSx0557) was identified in S.agalactiae <SEQ ID 1657> which encodes the amino 
acid sequence <SEQ ID 1658>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 38 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3781 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC10170 GB:AJ278301 response regulator [Streptococcus pneumoniae] 
15 Identities = 136/242 (56%) , Positives = 183/242 (75%) 

Query: 1 MNIFILEDDFVQQAHFEKIIKEIRVQYNLHFKTVETPAKPVQLLESIYEIGLHNLFFLD1 60 

M IF+LEDDF QQ E I+++ ++++ + E F KP QLL ++E G H LFFLDI 
Sbjct: 1 MRIFVLEDDFSQQTR1ETTIEKLLKEHHITLSSFEVFGKPDQLLAEVHEKGAHQLFFLDI 60 

20 

Query: 61 EIKNDEQMGLEVAKQIRQVDPYAQIVFVTTHSELMPLTFRYQVSALDyiDKGLSQEEFSQ 120 

EI+N+E GLEVA++IR+ DPYA IVFVTTHSE MPL+ FRYQVSALDYIDK LS EEF 
Sbjct: 61 EIRNEEMKGLEVARKIREQDPYALIVFVTTHSEFMPLSFRYQVSALDYIDKALSAEEFES 120 

25 Query: 121 RIEEVLLYVDGICNKPLVENSFYFKSRYSQVQLPFNDLLYIETSSRSHRVVLYTEKDRME 180 

RIE LLY + +K L E+ FYFKS+++Q Q PF ++ Y+ETS R HRV+LYT+ DR+E 
Sbjct: 121 RIETALLYANSQDSKSIAEDCFYFKSKFAQFQYPFKEVYYLETSPRPHRVILYTKTDRIjE 180 

Query: 181 FTATLGDILKQEPRLFQCHRSFLVNPLiNIFKVDRIDRLVYFQNGTTCLVSRNKVRDIVSI 240 
30 FTA+L ++ KQEPRL QCHRSFL+NP N+ +D+ ++L++F NG +CL++R KVR++ 

Sbjct: 181 FTASLEEVFKQEPRLLQCHRSFLINPANWHLDKKEKLLFFPNGGSCLIARYKVREVSEA 240 

Query: 241 VD 242 
++ 

35 Sbjct: 241 IN 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1659> which encodes the amino acid 
sequence <SEQ ID 1660>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 106/235 (45%) , Positives = 159/235 (67%) 

50 Query: 1 MNIFILEDDWQQAHFEKIIKEIRVQYNLHFKTVETFAKPVQLLESIYEIGLHNLFFLDI 60 

MNIFILEDDF+QQ E 1+ I + + +E F+ P +L ESI E G H L+FLDI 
Sbjct: 2 MNIFILEDDFIQQTRIESIWGILKETRIPCNQLEVFSTPQKLFESIQERGDHQLYFLDI 61 

Query: 61 EIKNDEQMGLEVAKQIRQVDPYAQIVFVTTHSELMPLTFRYQVSALDYinKGLSQEEFSQ 120 
55 EI + GLE+A IRQ DP A IVFVTTHSE P++F+Y+VSALD+IDK Q++F + 

Sbjct: 62 EIGEYTRCGLELAAAIRQKDPNAVIVFVTTHSEFAPISFKYKVSALDFIDKAGGQKQFKE 121 

Query: 121 RIEEVLLYVDGICNKPLVENSFYFKSRYSQVQLPFNDLLYIETSSRSHRVVLYTEKDRME 180 
+IEE + Y + + ++ F F++ ++++LP+ D+LY T++ H+V L+T+ +R+E 
60 , Sbjct: 122 QIEECIRYTYDMMSSRESKDMFLFETPQTRLKLPYKDILYFATATTPHKVCLWTQTERLE 181 
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Query: 181 FTATLGDILKQEPRLFQCHRSFLVNPLMIFKVDRIDRLVYFQNGTTCLVSRNKVR 235 

F L +1 P+LF CHRS+LVN + ++D+ +L+YF+NG +C+VSR K++ 
Sbjct: 182 FYGNLSEIQAVAPKLFLCHRSYLVNLDKWRIDKSKQLLYFEWGDSCMVSRLKMK 236 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 520 

A DNA sequence (GBSx0558) was identified in S.agalactiae <SEQ ID 1661> which encodes the amino 
acid sequence <SEQ ID 1662>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2651 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1663> which encodes the amino acid 
sequence <SEQ ID 1664>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 . 0535 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 177/269 (65%) , Positives = 219/269 (80%) 

Query: 6 MAKCLTLNTHSVmEVNALKKLFDIAEHIFREKYDIICLQEVNQSISSPLAKSSPNYHPIE 65 

M K LTLNTHSWM+ N LKKL LAEHI EKYDIICLQE+NQ I S LA P Y + 
Sbjct: 1 MTKVLTLNTHSVWQANTLKKLVALAEHILAEKYDIICLQEINQLIESELATDLPRYQALS 60 

Query: 66 GTPALHQDNFALQLVHYLNLQGLHYHWTWAYNHIGYSKYHEGVAILSLKPLKPEDILVSA' 125 

GTP++H+D+FAL L+HYL +G HY+W+WAYNHIGY Y EGVAILS +P+ DILVSA 
Sbjct: 61 GTPS1HKDHFALLLIHYLQKRGQHYYWSWAYNHIGYDIYQEGVAILSKQPIHVSDILVSA 120 

Query: 126 VDDETDYHTRRALVAETTLNDKVVTVVSLHFSWFEKGFAEEWKRLETTLLEVETPLLLMG 185 

+DDETDYHTRR+L+A+TTL+ K V W++H SWF+KGF EW++LE LL + PLLLMG 
Sbjct: 121 ^DETDYHTRRSLIAKTTLDGKEVAVVNVHLSWFDKGFLGEWEKLEKELLTLNCPLLLMG 180 

Query: 186 DFNNPTGNQGYELVIiNSPIALKDSHQIANHvEGDHTIMADIDGWEGNKKALKVDHIFTSE 245 

DFNNPT GY++++ SPL L+DSH+ A+ HVFGDH+ 1 + AD IDGW+GNK+ALKVDH+ FTS+ 
Sbjct: 181 DFNNPTDQDGYQVMMGSPLDLQDSHKGADHVFGDHSIVADIDGWQGNKEALKVDHVFTSK 240 

Query: 246 DLSISSSQWFEGGEAPWSDHYGLEITM 274 

D I SS++ FEGG+APWSDHYGLE+T+ 
Sbjct: 241 DFIIRSSKITFEGGDAPWSDHYGLEVTL 269 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 521 

A DNA sequence (GBSx0559) was identified in S.agalactiae <SEQ ID 1665> which encodes the amino 
acid sequence <SEQ ID 1666>. This protein is predicted to be PTS system, glucose-specific enzyme II, A 
component (ptsG). Analysis of this protein sequence reveals the following: 

5 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-8. 


.07 


Transmembrane 


193 


- 209 


( 


189 


- 217) 


INTEGRAL 


Likelihood 




-7. 


,86 


Transmembrane 


28 


- 44 


( 


24 


- 48) 


INTEGRAL 


Likelihood 




-6. 


,48 


Transmembrane 


431 


- 447 


( 


421 


- 449) 


INTEGRAL 


Likelihood 




-2. 


.92 


Transmembrane 


153 


- 169 


( 


153 


- 170) 


INTEGRAL 


Likelihood 




-2, 


.81 


Transmembrane 


93 


- 109 


( 


93 


- Ill) 


INTEGRAL 


Likelihood 




-2. 


.39 


Transmembrane 


370 


- 386 


( 


370 


- 388) 


INTEGRAL 


Likelihood 




-2, 


.28 


Transmembrane 


68 


- 84 


( 


68 


- 84) 



15 Final Results 

bacterial membrane Certainty=0 .4227 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 10201> which encodes amino acid sequence <SEQ ID 
10202> was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD00281 GB:U78600 putative ptsG protein [Streptococcus mutans] 
Identities = 294/409 (71%), Positives = 342/409 (82%), Gaps = 7/409 (1%) 

25 

Query: 293 DLINLKGS-NSSQYHHLLTSOTPARFKVGQMIGASGILMGLSYAMYRNVDKDKKLKYKSM 351 

DLI+LKG+ + SQYHHLLTSVTPARFKVGQMIG+SGILMGL+ AMYRNVD DKK KYK M 
Sbjct: 3 DLIHLKGAGHMSQYHHLLTSVTP^FKVGQMIGSSGIIjMGLTLAMYRNVDPDKKEKYKGM 62 

30 Query: 352 FISAAAATFLTGVTEPIEYMFMFAAMPLYLVYAWC^I^AMADIVNLRVHSFGNIEFLT 411 

F+SAA A FLTGVTEP+EYMFMFAA+PLYLVYAWQG AFA AD+++LRVHSFGNIEFLT 
Sbjct: 63 FLSAAVAVFLTGVTEPLEYMFMFAALPLyLVYAWQGLAFASADLIHLRVHSFGNIEFLT 122 

Query: 412 RVPMGIKAGLGGDIFNFVWVTLLFAVLMYFIANFMIKKFNLATAGRNGNYDNEEVDNAPS 471 
35 + PM IKAGL DI NF+ V+++F V MYFI NFMIKKFNLAT+GRNGNYD + D + 

Sbjct: 123 KTPMAIKAGLAMDIVNFIWSWFGVAMYFITNFMIKKFNLATSGRNGNYDTGD-DASDE 181 

Query: 472 TAS GSADANSQWQVINLLGGRDNIEDVDACMTRLRVTVKDGNSVGSEAAWKKAGA 527 

TAS G+A+ANSQ+V++INLLGG++NI DVDACMTRLR+TV D VG EAAWKKAGA 
40 Sbjct: 182 TASNSNAGTANANSQIVKIINLLGGKENISDVDACMTRLRITVTDVAKVGDEAAWKKAGA 241 

Query: 528 MGLVLKGNGVQAIYGPKADVLKSDIQDLLDSGTVIPIVDLETGQPVAAAPVTTYKGITEE 587 

MGL++KGNGVQA+YGPKADVLKSDIQDLLDSG IP D+ + A V ++KG+TEE 
Sbjct: 242 MGLIVKGNGVQAVYGPKADVLKSDIQDLLDSGVDIPKTDvTAPEEDKTADV-SFKGVTEE 300 

45 

Query: 588 IVSVANGQvFMDVVKDPVFSQKMMGDGFAVEPTDGNIYVPVSGTVTSVFPTKHAFGLLT 647 

+ +VA+GQV + V DPVFSQKMMGDGFAVEP +GNIY PV+G VTSVFPTKHA GLLT 
Sbjct: 301 VATVADGQVLPITQVHDPVFSQKMMGDGFAVEPENGNIYSPVAGLVTSVFPTKHALGLLT 360 

50 Query: 648 ESGLEVLVHIGLDWALDGQPFEVKISSGQKVVAGDLAWADLEAIKAA 696 

+ GLEVLVH+GLDTVAL+G PF K+ GQ+V GDL + VADLEAI K+A 
Sbjct: 361 DDGLEVLvHVGLDTVALNGAPFSAKVKDGQRVALGDLLLVADLEAIKSA 409 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1667> which encodes the amino acid 
55 sequence <SEQ ID 1668>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-13.43 Transmembrane 186 - 202 ( 181 - 213) 
INTEGRAL Likelihood = -6.79 Transmembrane 419 - 435 ( 412 - 442) 
60 INTEGRAL Likelihood = -5.52 Transmembrane 61 - 77 { 57 - 82) 
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INTEGRAL Likelihood = -3.56 Transmembrane 363 - 379 ( 363 - 381) 

INTEGRAL Likelihood = -1.97 Transmembrane 143 - 159 ( 142 - 160) 

INTEGRAL Likelihood = -0.16 Transmembrane 343 - 359 ( 343 - 359) 



5 Final Results 

bacterial membrane Certainty=0 . 6371 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:AAD00281 GB.-U78600 putative ptsG protein [Streptococcus mutans] 
Identities = 288/407 (70%) , Positives = 331/407 (80%) , Gaps = 2/407 (0%) 

Query: 286 DLvnLKGSD-ASAYSHLMDSVTPARFKA7GQMIGATGTLMGVALAMYRNVDADKKHTYKMM 344 
15 DL+HLKG+ S Y HL+ SVTPARFKVGQMIG++G LMG+ LAMYRNVD DKK YK M 

Sbjct: 3 DLIHLKGAGHMSQYHHLLTSVTPARFKVGQMIGSSGILMGLTLAMYRNVDPDKKEKYKGM 62 

Query: 345 FISAAAAVFLTGVTEPLEYLFMFAAMPLYIVYALVQGASFAMADLVNLRvHSFGNIELLT 404 
F+SAA AVFLTGVTEPLEY+FMFAA+PLY+VYA+VQG +FA ADL++LRVHSFGNIE LT 
20 Sbjct: 63 FLSAAVAVFLTGVTEPLEYMFMFAALPLYLVYAWQGLAFASADLIHLRVHSFGNIEFLT 122 

Query: 405 RTPMALKAGLGMDVINFVWSVLFAVIMYFIAD>5MIKKMHIiATAGRLGNYDA-DILGDRN 463 

+TPMA+KAGL MD++NF+ VSV+F V MYFI + MIKK +LAT+GR GNYD D D 
Sbjct: 123 KTPMAI KAGLAMD I VNF I WSWFG VAMYFI TNFMI KKFNLATSGRNGNYDTGDDASDET 182 

25 

Query: 464 TQTRPTQVADSNSQWQIWLLGGAGNIDDVDACMTRLRVTVKDPAKVGAEDDWKKAGAI 523 

A+ +NSQ+V+ 1 +NLLGG NI DVUACMTRIiR+TV D AKVG E WKKAGA+ 
Sbjct: 183 ASNSNAGTANANSQIVKI INLLGGKENI SDVDACMTRLRITVTDVAKVGDEAAWKKAGAM 242 

30 Query: 524 GLIQKGNGVQAVYGPKADILKSDIQDLLDSGALIPEVIMSQLTSKPTPAKDFKHVTEDVL 583 

GLI KGNGVQAVYGPKAD+LKSDIQDLLDSG IP+ +++ T FK VTE+V 

Sbjct: 243 GLIVKGNGVQAvYGPKADVLKSDIQDLJ^GVDIPKTDVTAPEEDKTADVSFKGVTEEVA 302 

Query: 584 SVADGMVLPITGVKDQVFAAKMMGDGFAVEPTHGNIYAPVAGLVTSVFPTKHAFGLLTDN 643 
35 +VADG VLPIT V D VF+ KMMGDGFAVEP +GNIY+PVAGLVTSVFPTKHA GLLTD+ 

Sbjct: 303 TVADGQVLPITQVHDPVFSQKMMGDGFAVEPENGNIYSPVAGLVTSVFPTKHALGLLTDD 362 

Query: 644 GLEVLVHVGLDTVALNGVPFSVKVSEGQRVHAGDLLWADLAAIKSA 690 
GLEVLVHVGLDTVALNG PFS KV +GQRV GDLL+VADL AIKSA 
40 Sbjct: 363 GLEVLVHVGLDTVALNGAPFSAKVKDGQRVALGDLLLVADLEAIKSA 409 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 517/731 (70%) , Positives = 606/731 (82%) , Gaps = 7/731 (0%) 

45 Query: 8 MKNNVKQLFSFEFWQKFGKRLMWIAVMPAAGLMVSIGNSISLLDPSNVLLGRIANVIAQ 67 

MK + KQLF FEFWQKFGK LMWIAVMPAAGLM+SIGNSI +++ + L + N+IAQ 
Sbjct: 1 MKTSFKQLFRFEFWQKFGKCLMWIAVMPAAGLMISIGNSIPMINHDSAFLASLGNIIAQ 60 

Query: 68 IGWGVIGNLHILFALAIGGSWAKERAGGAFAAGLSFILINLITGNFFGVKTDMLADSKAT 127 
50 IGW VI NLH+LFALAIGGSWAKERAGGAFA+GL+F+LIN ITG F+GV + MLAD +A 

Sbjct: 61 IGWAVIVNLHLLFAIAIGGSWAKERAGGAFASGLAFVLINRITGAFYGVSSTMLADPEAK 120 

Query: 128 VQTVFGATIRVSDYFVNVLGQPAI^GVFVGIISGFVGATAFNKYYNYRKLPDALTFFNG 187 
+ ++ G + V DYF +VL PALN GVFVGI I +GFVGATA+NKYYNYRKLP+ LTFFNG 
55 Sbjct: 121 ITSLLGTQMIVXDYFTSVLESPALNTGVFVGIIAGFVGATAYNKYYNYRKLPEvLTFFNG 180 

Query: 188 KRWPFWIYRSVIVALILSVFWPWQSGINGFGKWIASSQDSAPILAPFVYGTLERLLL 247 

KRFVPFWI RS+ VALIL V WPV+QSGIN FG WIASSQDSAPILAPF+YGTLERLLL 
Sbjct: 181 KRFVPFWILRSIFVALILWVWPVIQSGINSFGMWIASSQDSAPILAPFLYGTLERLLL 240 



60 



Query: 248 PFGLHHMLTIPMNYTQLGGTYTVLTGATKGAQVLGQDPLWLAWVGDLINLKGSNSSQYHH 307 

PFGLHHMLTI PMNYT LGGTY V+TGA G +V GQDPLWLAWV DL++LKGS++S Y H 
Sbjct: 241 PFGLHHMLTIPMNYTALGGTYEVMTGAAAGTKVFGQDPLWLAWVTDLVHLKGSDASAYSH 300 



65 



Query: 



308 LLTSVTPARFKVGQMIGASGILMGLSYAMYRNVDKDKKLKYKSMFISAAAATFLTGVTEP 367 
L+ SVTPARFKVGQMIGA+G LMG++ AMYRNVD DKK YK MFISAAAA FLTGVTEP 
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Sbjct: 301 LMDSVTPARFKVGQMI GATGTLMGVRIiftMYRNVDADKKHTYKMMFI SAAAAVFLTGVTEP 360 

Query: 3 58 IEYMFMFAAMPLYLWAWQGCaFAMADIVNLRVHSFGNIEFLTRVPMGIKAGLGGDIFN 427 

+EY+FMFAAMPLY+VYA+VQG +FAMAD+VNLRVHSFGNIE LTR PM +KAGLG D+ N 
Sbjct: 361 LEYLFMFAAMPLYIVYALVQGASFAMADLVNLRVHSFGNIELLTRTPMALKAGLGMDVIN 420 

Query: 428 FVWVTLLFAVLMYFIANFMIKKFin^TAGRNGira)KEEVD--NAPSTASGSADANSQVVQ 485 

FVWV++LFAV+MYFIA+ MIKK +LATAGR GNYD + + N + + AD+NSQWQ 
Sbjct: 421 FVOTSVLFAVIiWFIADMMIKKMHIATAGRLGNTOADILGDRNTQTRPTQVADSNSQVVQ 480 

Query: 486 VII^LGGRDNIEDVDACMTRLRVTVKDGNSVGSEflAWKKAGAMGIjVLKGWGVQAIYGPKA 545 

++NLLGG NI+DVDACMTRLRVTVKD VG+E WKKAGA+GL+ KGNGVQA+YGPKA 
Sbjct: 481 IVNLLGGAGNIDDVDACMTRLRVTVKDPAKVGAEDDWKKAGAIGLIQKGNGVQAVYGPKA 540 

15 Query: 546 DVLKSDIQDLLDSGTVIP1VDLE--TGQPVAAAPVTTYKGITEEIVSVANGQVEALDWK 603 
D+LKSDIQDLLDSG +IP V++ T +P P +K +TE+++SVA+G V + VK 
Sbjct: 541 DILKSDIQDLLDSGALIPEVNMSQLTSKP TPAKDFKHVTEDVLSVADGMVLPITGVK 597 

Query: 604 DPVFSQKMMGDGFAVEPTDGNIYVPVSGTVTSVFPTKHAFGLLTESGLEVLVHIGLDTVA 663 
20 D VF+ KMMGDGFAVEPT GNIY PV+G VTSVFPTKHAFGLLT+ +GLEVLVH+GLDTVA 

Sbjct: 598 DQVFAA.KMMGDGFAVEPTHGNIYAPVAGLVTSVFPTKHAFGLLTDNGLEVLVHVGLDTVA 657 

Query: 664 LDGQPFEWISSGQKOTAGDIAWADLEAIKAAGKETSVIIVFTNVSDIKTVKLEKSGPQ 723 
L+G PF VK+S GQ+V AGDL WADL AIK+A +ET +++ FTN ++I+ V L G Q 
25 Sbjct: 658 LNGVPFSVKVSEGQRVHAGDLLVVADIiAAIKSAERETIIVVAFTNTTEIQDVTLTSLGAQ 717 

Query: 724 IAKTWAKVEL 734 

AKT VA VEL 
Sbjct: 718 PAKTKVATVEL 728 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 522 

A DNA sequence (GBSx0560) was identified in S.agalactiae <SEQ ID 1669> which encodes the amino 
35 acid sequence <SEQ ID 1670>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2266 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 523 

A DNA sequence (GBSx0561) was identified in S.agalactiae <SEQ ID 1671> which encodes the amino 
50 acid sequence <SEQ ID 1672>. This protein is predicted to be alkaline phosphatase synthesis sensor protein 
phor (hpkA). Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.96 Transmembrane 160 - 176 ( 148 - 183) 
55 INTEGRAL Likelihood = -8.65 Transmembrane 20 - 36 ( 13 - 41) 
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Final Results 

bacterial membrane Certainty=0 . 6583 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8595> which encodes amino acid sequence <SEQ ID 8596> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 26 

Peak Value of OR: 3.27 
Net Charge of CR: 3 
McG: Discrim Score: 14.63 
GvH: Signal Score (-7.5): -5.64 

Possible site: 26 
>» Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -13.96 threshold: 0.0 

INTEGRAL Likelihood =-13.96 Transmembrane 152 - 168 ( 140 - 175) 
INTEGRAL Likelihood = -8.65 Transmembrane 12 - 28( 5 - 33) 
PERIPHERAL Likelihood = 1.59 135 
modified ALOM score: 3.29 
icml HYPID: 7 CFP: 0.658 

*** Reasoning Step:- 3 

Final Results 

bacterial membrane Certainty=0 .6583 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS gene <SEQ ID 8593> and protein <SEQ ID 8594> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 

McG: Discrim Score: 14.63 

GvH: Signal Score (-7.5): -5.64 
Possible site: 26 

>>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 2 value: -13.96 threshold: 0.0 

INTEGRAL Likelihood =-13.96 Transmembrane 152 - 168 ( 140 - 175) 
INTEGRAL Likelihood = -8.65 Transmembrane 12 - 28 ( 5 - 33) 
PERIPHERAL Likelihood = 1.59 135 
modified ALOM score: 3.29 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 6583 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

34.9/61.1% over 363aa 

Thermotoga maritima 

EGAD| 131465 | sensor histidine kinase HpkA Insert characterized 
GP|l575578|gb|AAC44437.l| |U67196 histidine protein kinase Insert characterized 
GP| 4982228 |gb|AAD36721.l|AE001807_12|AE001807 sensor histidine kinase HpkA Insert 
character! zed 

PIR|C72228|C72228 sensor histidine kinase HpkA - (strain MSB8) Insert characterized 



ORF00680(919 - 1977 of 2277) 
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EGAD|l31465|TM1654 (48 - 411 of 412) sensor histidine kinase HpkA {Thermotoga maritima} 
GP|l575578|gb|AAC44437.l| |U67196 histidine protein kinase {Thermotoga maritima} 
GP|4982228|gb|AAD36721.l|AE001807_12|AE001807 sensor histidine kinase HpkA {Thermotoga 
maritima} PIR| C72228 | C72228 sensor histidine kinase HpkA - Thermotoga maritima (strain 
5 MSB 8) 

%Match =13.6 

%Identity =34.8 %Similarity =61.0 

Matches = 125 Mismatches = 134 Conservative Sub.s = 94 

10 720 750 780 810 840 870 900 930 

AAQRL1MGTIWLSVAQQTIFYLLLGMISPLAIIILLAIILSVLIARYIAKKVSEPIJOTIDLDHPLSNDSYEEITPLLRR 

: - =1 11= |::|| |: : = 1= I 

MSVFLFVIVAVLFVLLFLVFKKRLSEYKILIEKLSDMLGEKGVPPLYLFER 
10 20 30 40 50 

15 

960 990 1020 1050 1080 1110 1140 1170 

LDSHQAKIQHQKLLLQKRQKEFDTIISKIKEGMILLDDQARIVSINAEALKLFQINDDWHGRFMMEVSRDLTLKDLIDQG 



20 70 80 90 100 110 120 130 

1197 1215 1245 1275 1305 1335 1365 1395 

LKGKK- KEAN 1GIEN1SIHYRVLVRPTTDNNRVTGLVVLLFDVTDQLQMEQLQREFTANVSHELKTPLHVISGYSELL 

:| :: :| :| I := I I I = = 1 = 1 = III = ---III I llllhlll I 11 = 1 I 

25 IKSEEPQEGTLVTYVGNEKKYFHVKVIPVELKSGDKIFVILFHDVTKERKLDEMRREFIATVSHELRTPLTSIHGYAETL 

150 160 170 180 190 200 210 

1452 1482 1512 1539 1569 1599 1629 

ANQWPISKE-VPQFAAKIHKESERLVKLVEDIINLSHLDEQE-KLPQETV 

30 : |:| | :| I =11 1= =1= l = = = l -II : = I = I - = I =11 = === 1 = 

LEDDLENKELVTOJFLKIIEEESARMTRLINDLLDLEKIEESF^FEMKDVDLCEVIEYVYRIIQPIAEENEVDLIVECED 

230 240 250 260 270 280 290 

1659 1689 1713 1767 1797 1827 1857 

35 AILRGNPVLLNSLVYNLCDNAITYNH- -EKGQVNVTLK- -NSPDTITLEVSDTGLGIAEKDKKRIFERFYRVDKSRSKIV 

-III. I == II 111= I 111= I == -II = =11 III II == = 1111 = 111111 = 11= : 

VWRGNKERLIQMLLNLVDNAVKYTSLKEKGEKKVWVRAYDTPDWVWEVEDTGPGI PKEAQSRI FEKFYRVDKARSRKM 
310 320 330 340 350 360 370 

40 1887 1917 1947 1977 2007 2037 2067 2097 

GGTGLGLSIVKSALDFHNGSIKVDSHLGQGTTMTVLLHKQ*KLTNKSLDDII*TFLVIQKKYISKGLQKTNKCYNKTXX* 

1111111=111= =1 I I 1=1=1 = III I III 1= 
GGTGLGLTIVKTIVDKHGGKIEVESEINQGTLMRVLLPKRR 
390 400 410 

45 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06875 GB:AP001517 two-component sensor histidine kinase 

involved in phosphate regulation [Bacillus halodurans] 
Identities = 176/589 (29%) , Positives = 315/589 (52%) , Gaps = 47/589 (7%) 

50 





Query: 


9 


MTKKIFRTTLSASLGI VLVTILMIMG FLYNYFNHIQREQLRTQTALAS 


56 








MTK +R L+ ++ VT+L++ G +L N + +++E + + + 






Sb j ct : 


1 


MTKFRYRLVLA VLTVTLL VMAGLGL VTGQI FKNVYLENLTDRLKKETYLAASMVEN 


56 


55 


Query: 


57 


QGI S F - EGKDYFENLKTS -^^VRITWVDNKGQ VLYOTQSDAKHMKNHANRQE I KEAI KSGY 


114 








+ + FE+ E+ + R+T + G V+ ++ +D M+NHA+R E E ++ G 






Sbjct: 


57 


EAVLFNEVQTLTEE I SQKLDARVT I ILADGTWGESAADPAEMENHADRPEFTE - LEEGI 


115 




Query: 


115 


GESTRWSATL-TEKSIYAAQRLN--NGTI--VRLSVAQQTIFYLLLGMISPLAIIILLAI 


169 


60 






R+S T+ TE YA N N TI VRL + ++ + + + L+ +A 






Sb j ct : 


116 


- - - WYSTTVETELLFYAVPIQNEANETIGYVRLGLPIEAvNSVNRTLWAILIVSFTIAF 


172 




Query: 


170 


ILSVLIARYIAKKVSEPIiNNI DLDHPLSNDSYEE ITPLLRRLDSHQAKI Q 


219 








++ V + IA ++ P+ + D S +S +E+ Ii R ++ ++ 




65 


Sb j ct : 


173 


LVIVSOTYRIANQMIRPIESATVVANKLAEGDYQARTSEESRDEVGQL^ 


232 



Query: 220 HQKLLLQKRQKEFDTIISKIKEGMILLDDQARIVSINAEALKLFQINDD-WHGRFMMEVS 278 
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Sbjct: 


233 


QLTKRHQVQKERLETLIENMGSGLILINTRGDISLINKTCHDIFQEDTDLWLHQLYHDVI 


292 


Query : 






jjz 






+ + ++ +K++ I +E H+ V P +N ++ G+ ++ D+T+ 




Sb j ct : 


293 


KHKEIIKIVQDIFLTEKRQRRQVKLPIHLEYRHFDVHGAPIVRENGKLKGIALVFHDITE 


352 


Query: 


T3 "2 

JJJ 


C\T nMTVnT nDtr"n ,r P2i"Nn7Q'P"C , T VTDT "P17 , TCr^VCT?T.T AT\T^^\/TC?D^T^^T^\^-D^T?71A7fTTJ^^^QT^ , RT.VIf 
yijyi v irjyijyKilir iriri VoxiCjiJiVl irljri V Xo^xoJ2UJljH±\l^l v i v rTJiiii v iryr/t^i^xriJxijOJLr^jj vi\. 


J -3 1 






++EQ++++F ANVSHELKTP+ I G++E L + + +E++ QF I KESERL 




Sb j ct : 


353 


LKKLEQVRKDFVANVSHELKTPVTSIKGFTETLLDGAMHDEQLRDQFLHIIWKESERLQS 


412 


Query: 






449 






L+ D++ LS +++ +L + NL+ + +V+ L+ KA++K I 1+ + E + L G+ 




Sb j ct : 


413 


LIHDLLELSKIEQNYFQIiNWQQTNLFAWSEVMTLLKGKAEEKGIDISLSAEGSFDLEGD 


472 


Query: 


450 


PVLIiNSLVYNLCDNAITYNHEKGQVNVTLKNSPDTITIiEVSDTGLGIAEKDKKRIFERFY 


509 






P L + NL +NAITY G++++ LK+ D + EV+DTG+GI E + RIFERFY 




Sb j ct : 


473 


PERLKQIAINLVNNAITYTSNGGRIDIALKDHGDWEFEVNDTGIGIRESEIPRIFERFY 


532 


Query: 


510 


RVDKSRSKIVGGTGLGLS IVKSALDFHNGSIKVDSHLGQGTTMTVLLHK 558 








RYD++RS+ GGTGLGL+IVK ++ H G I V+S G+GTT T+ H+ 




Sbj ct : 


533 


RVDRARSRNSGGTGLGLAI VKHLVEAHQGKI LVESEFGKGTTFTI QFHR 581 





There is also homology to SEQ ID 1 178. 

25 SEQ ID 8594 (GBS340) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 10; MW 86kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 11 (lane 7; MW 61.5kDa) and in Figure 
77 (lane 10;MW62kDa). 

Purified GBS340-GST is shown in Figure 223, lane 2; purified GBS340-His is shown in Fig. 191, lane 9. 

30 The purified GBS340-GST fusion product was used to immunise mice. The resulting antiserum was used 
for Western blot (Figure 254A), FACS (Figure 254B), and in the in vivo passive protection assay (Table 
III). These tests confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective 
protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 524 

A DNA sequence (GBSx0562) was identified in S.agalactiae <SEQ ID 1673> which encodes the amino 
acid sequence <SEQ ID 1674>. This protein is predicted to be phosphate regulon transcriptional regulatory 
protein phob (phoB). Analysis of this protein sequence reveals the following: 

40 Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2617 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10203> which encodes amino acid sequence <SEQ ID 
10204> was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73502 GB:AE000146 positive response regulator for pho 
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regulon, sensor is PhoR (or CreC) [Escherichia coli K12] 
Identities = 98/224 (43%) , Positives = 138/224 (60%) , Gaps = 2/224 (0%) 



Query : 


2 


IYCVEDDADIREMMLYTLQMAGFKAQGFSSSELFWEAIQEKVPDLILLDIMLPGDDGLTI 


61 






I VED+A IREM+ + L+ GF+ + + E PDLILLD MLPG G+ 




Sbjct : 


5 


ILVVEDPoAPIREMVCFVLEQNGFQPVEAEDYDSAVNQLIIEPWPDLILLDWMLPGGSGIQF 


64 


Query: 


62 


LERLRRKHQTEMIPVIMTTAKGSEYDKVKGLDLGADDYLVKPFGMMEMISRIKAVLRRSR 


121 






++ L+R+ T IPV+M TA+G E D+V+GL+ GADDY+ KPF E+++RIKAV+RR 




Sbjct: 


65 


IKHLKRESMTRDIPWMLTARGEEEDRVRGLETGADDYITKPFSPKELVARIKAVMRRIS 


124 


Query: 


122 


QVDSKAHI I IGNLEIDPTNYWVKRGTEKIHLTLKEFELL VLFFRNPNRVFTRQELLDKVW 


181 






+ + I + L +DPT++ V G E + + EF+LL F +P RV++R++LL4- VW 




Sbjct: 


125 


PMAVEEVIEMQGLSLDPTSHRVMAGEEPLEMGPTEFKLLHFFMTHPERVYSREQLLNHVW 


184 


Query: 


182 


GEQFLGETRTVDVHIGTLRTKLGEDGY- -LIATVRGVGYRLEER 223 








G E RTVDVHI LR L G+ ++ TVRG GYR R 




Sbjct: 


185 


GTNVYVEDRTVDVHIRRLRKALEPGGHDRMVQTVRGTGYRFSTR 228 





20 There is also homology to SEQ ID 1 1 82. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 525 

A DNA sequence (GBSx0563) was identified in S.agalactiae <SEQ ID 1675> which encodes the amino 
25 acid sequence <SEQ ID 1676>. This protein is predicted to be phosphate transport system regulatory 
protein (phoU). Analysis of this protein sequence reveals the following: 
Possible site: 33 

>>> Seems to have no N- terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 1188 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG08750 GB:AE004948 phosphate uptake regulatory protein PhoU 
[Pseudomonas aeruginosa] 
Identities = 66/213 (30%), Positives = 119/213 (54%), Gaps = 4/213 (1%) 



Query: 


2 


IRSRFASQLNDLNKEIIFMGALCEDIIGKSLGALTNSNDvYLDDISETYHKIEQMERDIE 


61 






I +F ++L D+ ++ MG L E + ++ AL +++ + E +1 QMER+I+ 




Sbjct: 


11 


ISQQFNAELEDWSHLIAMGGLVEKQvNDAVNALIDADSGLAQQVREIDDQINQMERNID 


70 


Query: 


62 


ERCLKLLLRQQPVAKDLRRISSALKMVYDMKRIGAQAYEIAEIVSLGHIIQGSGSERD-- 


119 






E C+++L R+QP A DLR IS K V D++RIG +A ++A + + S R 




Sb j ct : 


71 


EECVRILARRQPARSDLRLIISISKSVIDLERIGDEASKVARRAI- -QLCEEGESPRGYV 


128 


Query: 


120 


QLNSMSNOTISMLTKSIDAFIYDNEEQAHQVIEQDRTVNQEFDTIKKQLVLYFSVQDVDG 


179 






++ + + V M+ +++DAF + + A V + D+TV++E+ T ++LV Y 




Sb j ct : 


129 


EvRHIGSQVQKIWQEALDAFARFDADIALSVAQVDKTVDREYKTALRELVTYMMEDPRAI 


188 


Query: 


180 


EYPIDVLMIAKYLERIGDHTVNIAKWVLFSITG 212 








++++ + LERIGDH NIA+ V++ + G 




Sb j ct : 


189 


SRVLNIIWALRSLERIGDHARNIAELVIYLVRG 221 





55 

There is also homology to SEQ ID 1678. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 526 

A DNA sequence (GBSx0564) was identified in S.agalactiae <SEQ ID 1679> which encodes the amino 
acid sequence <SEQ ID 1 680>. This protein is predicted to be ATP-binding cassette protein PstB (pstB-2). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2432 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10205> which encodes amino acid sequence <SEQ ID 
10206> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22041 GB:AF118229 ATP-binding cassette protein PstB 
[Streptococcus pneumoniae] 
Identities = 166/245 (67%) , Positives = 211/245 (85%) , Gaps = 1/245 (0%) 

20 Query: 10 INNLDLYYGEFHALKDVNLDIEEKEITAFIGPSGCGKSTLLKSINRMNDLVKNCKITGDI 69 

+ +LDL+YG+F ALK++++ + E++ITA IGPSGCGKST LK++NRMNDLV +C I G + 
Sbjct: 6 WHLDLFYGDFQALKNISIQLPERQITALIGPSGCGK3TFLKTKNRMNDLVPSCHIEGQV 65 

Query: 70 TLEGEDVYR-QLDINQLRKKVG^^VFQKPNPFPMSIYDNVAFGPRTHGIHSKAELDDIVER 128 
25 L+ +D+Y + ++NQLRK+VGMVFQ+PNPF MSIYDNVA+GPRTHGI K +LD +VE+ 

Sbjct: 66 LLDEQDIYSSKFNLNQLRKRVGMVFQQPNPFAMSIYDNVAYGPRTHGIRDKKQLDALVEK 125 

Query: 129 SLKQAALWDEVTORLHKSAI^MSGGQQQRLCIARAIAIEPDVLLMDEPTSALDPISTAKI 188 
SLK AA+W+EVKD L KSA+ +SGGQQQRLCIARALA+EPD+LLMDEPTSALDPIST KI 
30 Sbjct: 126 SLKGAAIWEEVKDDLKKSAMSLSGGQQQRLCIARALAVEPDILLMDEPTSALDPISTLKI 185 

Query: 189 EELVIQLKKNYTIVIVTHNMQQAVRISDKTAFFLMGEVVEYNKTSQLFSLPQDERTENYI 248 

E+L+ QLKK+YTI + I VTHNMQQA RISDKTAFFL GE+ E+ T +F+ P+D+RTE+YI 
Sbjct: 186 EDLIQQLKKDYTIIIVTHNMQQASRISDKTAFFLTGEICEFGDTVDVFTNPKDQRTEDYI 245 

35 

Query: 249 TGRFG 253 

+GRFG 
Sbjct: 246 SGRFG 250 

40 There is also homology to SEQ ID 1682. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 527 

A DNA sequence (GBSx0565) was identified in S.agalactiae <SEQ ID 1683> which encodes the amino 
45 acid sequence <SEQ ID 1684>. This protein is predicted to be transmembrane protein PstA (pstA-2). 
Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have a cleavable N-term signal seq. 

50 



INTEGRAL 


Likelihood 


=-13. 


11 


Transmembrane 


265 - 


281 


( 


255 


- 286) 


INTEGRAL 


Likelihood 


= -8. 


81 


Transmembrane 


79 - 


95 


( 


68 


- 100) 


INTEGRAL 


Likelihood 


= -4. 


78 


Transmembrane 


195 - 


211 


( 


192 


- 213) 


INTEGRAL 


Likelihood 


= -4. 


67 


Transmembrane 


147 - 


163 


( 


143 


- 164) 


INTEGRAL 


Likelihood 


= -2. 


92 


Transmembrane 


122 - 


138 


( 


120 


- 138) 


INTEGRAL 


Likelihood 


= -0. 


90 


Transmembrane 


40 - 


56 


( 


39 


- 56) 



55 
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Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 6243 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22040 GB:AF118229 transmembrane protein PstA [Streptococcus pneumoniae] 
Identities = 135/263 (51%) , Positives = 203/263 (76%) 





23 


FFLFAI VYLGAHjSFATIAFWIYILVKGLPHVNTGLFAWTYNTQNVSLLPAFINTIFII 


82 






4- T. 4- T.4-F 4-4- 4-4. 4-TT i4-TCf4T<PH4-4- TiF4-WTV 4-4-M4-RTi4-PA T4-T4. 4.4. 
*r u tvx t utj7 + i *rij_iTi\.vjij-iirm r urtnii ttj.\i tout jr.fi itit it 




Sb j ct : 


4 


YLLKLIjvYCFSALTFGSIiFLIIGFILIKGLPHLSLSLFSWTYTSENISLMPAIISTVILV 


63 


Query: 


83 


ALTLLFAVPLGIGGSIYLTEYARRDNPYLKIIRVATETLAGIPSIIYGLFGALFFVKYTH 


142 






LL A+P+GI YL EY ++D+ +KI+R+A++TI,+GIPSI++GLFG LFFV + 




Sb j ct : 


64 


FGALLLALPIGIFAGFYLVEYTKKDSLCVKIMRLASDTLSGIPSIVFGLFGMLFFWFLG 


123 


Query: 


143 


LGLSLISGSLTLSIMILPLIMRTTEEALLSVPDSYREGAFALGAGKLRTIFKIVLPSAMS 


202 






SL+SG LT IM+LP+I+R+TEEALLSV DS R+ ++ LGAGKLRT+ F+ 1 VLP AM 




Sbjct: 


124 


FQYSLLSGILTSVIMVLPVIIRSTEEALLSVSDSMRQASYGLGAGKLRTVFRIVLPVAMP 


183 


Query: 


203 


GIFAGIIIAVGRIIGESAALIFTAGTVAKVAHSVFSSSRTLAVHMYAISGEGLYVDQTYA 


262 






GI AG4-ILA+GRI+GE+AAL++T GT S+ SS R+LA+HMY +S EGL+V++ YA 




Sb j ct : 


184 


GIIjAGVIIAIGRIVGETAAL^TLGTSTNTPSSLMSSGRSIALHMYMLSSEGLHVNEAYA 


243 


Query: 


263 


TAVILLLLVI IVNFVSGLVAKRL 285 








T VIL++ V+++N +S L++++L 




Sb j ct : 


244 


TGVILIITVLMINTLSSLLSRKL 266 





There is also homology to SEQ ID 1686. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 528 

A DNA sequence (GBSx0566) was identified in S.agalactiae <SEQ ID 1687> which encodes the amino 
acid sequence <SEQ ID 1688>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2687 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 529 

A DNA sequence (GBSx0567) was identified in S.agalactiae <SEQ ID 1689> which encodes the amino 
acid sequence <SEQ ID 1690>. This protein is predicted to be transmembrane protein PstC (pstC-2). 
Analysis of this protein sequence reveals the following: 



Possible site: 23 

>» Seems to have a cleavable N-term signal seq. 
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INTEGRAL 


Likelihood 


= - 


•10. 


,67 


Transmembrane 


256 


- 272 


( 


251 - 


279) 


INTEGRAL 


Likelihood 




-8. 


,86 


Transmembrane 


141 


- 157 


( 


133 - 


162) 


INTEGRAL 


Likelihood 




-4. 


.99 


Transmembrane 


111 


- 127 


( 


109 - 


132) 


INTEGRAL 


Likelihood 




-4 


,30 


Transmembrane 


76 


- 92 


( 


72 - 


95) 


INTEGRAL 


Likelihood 




-1 


.86 


Transmembrane 


25 


- 41 


( 


24 - 


42) 


INTEGRAL 


Likelihood 




-1. 


.33 


Transmembrane 


59 


- 75 


( 


59 - 


75) 


INTEGRAL 


Likelihood 




-0. 


.27 


Transmembrane 


203 


- 219 


( 


202 - 


219) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 5267 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22039 GB:AP118229 transmembrane protein PstC [Streptococcus pneumoniae] 
Identities = 162/266 (60%) , Positives = 212/266 (78%) , Gaps = 3/266 (1%) 



Query: 


15 


ITACVSVISAILICLFLFSSGLPAITKIGWGNFIFGKVWHPSN--NIFGIFPMIVGSLYV 


72 






++A V+V++ +LIC F+FS+GLP I G+ F+ G W P+N +GI PMIVGSL + 




Sb j ct : 


1 


MSATVAWAILLICFFIFSNGLPFIANYGFARFLLGSDWSPTNIPASYGILPMIVGSLLI 


60 


Query: 


73 


TAGALLLGGPIGILTAVFMAYFCPENIYKPLKSAINLMAGIPSWYGFFGLWIVPMIRQ 


132 






T GA+++G P GILT+VFM Y+CP+ +Y LKSAINLMA I PS+VYGFFGL ++VP IR 




Sb j ct : 


61 


TLGAI VIGVPTGILTSVFMVYYCPKPVYGFLKSAINLMAAIPSI VYGFFGLQLLVPWIRS 


120 


Query: 


133 


YIGGFGMGVLAASILLGIMILPTIVSISESSLRAVPESYYEGGIALGASHERSVFFAVLP 


192 






++G GM VL AS+LLGIMILPTI+S+SES++R VP++YY G +ALGASHERS+F +LP 




Sb j ct : 


121 


FLGN-GMSVLTASLLLGIMILPTIISLSESAIRTVPKTYYSGSLALGASHERSIFSVILP 


179 


Query: 


193 


A&KRGI]J^WLGIGRAIGETMAVI^WAGNQAVLPQSLTSGVRTLTTNIVMEMGYSSGLH 


252 






AA+ GIL++V+LGIGRA+GETMAVI+VASNQ ++P L SG RTLTTNIV+EM Y+SG H 




Sb j ct : 


180 


AARSGILSAVILGIGRAVGETMAVILVAGNQPIIPSGLFSGTRTLTTNIVLEMAYASGQH 


239 


Query: 


253 


RQALIGTAWLFIFILMINISFSALQ 278 








R+ALI T+ VLF IL+IN F+ L+ 




Sb j ct : 


240 


REALIATSAVLFFLILLINAYFAYLK 265 





There is also homology to SEQ ID 1692. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 530 

A DNA sequence (GBSx0568) was identified in S.agalactiae <SEQ ID 1693> which encodes the amino 
acid sequence <SEQ ID 1694>. This protein is predicted to be probable hemolysin precursor (pstS). 
Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22038 GB:AF118229 phosphate binding protein PstS 
[Streptococcus pneumoniae] 
Identities = 134/295 (45%) , Positives = 185/295 (62%) , Gaps = 9/295 (3%) 



Query: 1 MKKHKMLSLLAVSGLMGIGILAGCSNDSSSSSK GTINIVSREEGSGTRGAFIELFGI 57 

MK KML+L A+ GL G G++A C N S++S + GTI ++SRE GSGTRGAF E+ GI 
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Sbjct: 1 MKFKKMLTLAAI -GLSGFGLVA- CGNQSAASKQSASGTIEVISRENGSGTRGAFTEITGI 58 

Query: 58 ESKNKKGEKVDHTSDAATVTNSTSVMLTTVSKDPSAIGYSSLGSI^SSVKVLKIDGKNAT 117 

K+ +K+D+T+ A + NST +L+ V + +AIGY SLGSIi SVK L+IDG A+ 
Sbjct: 59 LKKDGD-KKIDNTAKTAVIQNSTEGVLSAVQGNANAIGYISLGSLTKSVKALEIDGVKAS 117 

Query: 118 VKDIKSGSYKISRPFNIVTKEGKEKEATKDFIDYILSKDGQAWEKNGYIPL-DNAKAYQ 176 

+ G Y + RPFNIV K +DFI +1 SK GQ W N +1 Y 

Sbjct: 118 RDTVLDGEYPLQRPFNIVWSSNLSK-LGQDFISFIHSKQGQQWTDNKFIEAKTETTEYT 176 

Query: 177 AKVSSGKVVIAGSSSVTPVMEKIKEAYHKVNAKVDVEIQQSDSSTGITSAIDGSADIGMA 236 

++ SGK+ + GS+SV+ +MEK+ EAY K N +V ++I + SS GIT+ + +ADIGM 
Sbjct: 177 SQHLSGKLSWGSTSVSSLMEKIAEAYKKENPEVTIDITSNGSSAGITAVKEKTADIGMV 236 

15 Query: 237 SRELDKTESSKGVKATVIATDGIAVVVNKKNKVNDLSTKQVKDIFTGKTTSWSDL 291 

SREL K K + IA DGIAWVN NK + +S ++ D+F+GK T+W + 
Sbjct: 237 SREL-TPEEGKBLTHDAIALDGIAVVVNNDNKASQVSMAELADVFSGKLTTWDKI 290 

There is also homology to SEQ ID 1696. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8597> and protein <SEQ ID 8598> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 23 Crend: 4 
25 McG: Discrim Score: 7.91 

GvH: Signal Score (-7.5): -3.72 

Possible site: 34 
>>> May be a lipoprotein 

ALOM program count: 0 value: 2.44 threshold: 0.0 
30 PERIPHERAL Likelihood = 2.44 248 

modified ALOM score: -0.99 

*** Reasoning Step: 3 

35 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 SEQ ID 1694 (GBS24) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 9; MW 33kDa). 

GBS24-His was purified as shown in Figure 194, lane 10. 
Example 531 

A DNA sequence (GBSx0569) was identified in S.agalactiae <SEQ ID 1697> which encodes the amino 
45 acid sequence <SEQ ID 1698>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 1725 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0,0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 



55 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 532 

A DNA sequence (GBSx0570) was identified in S.agalactiae <SEQ ID 1699> which encodes the amino 
acid sequence <SEQ ID 1700>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2741 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05069 GB:AP001511 unknown conserved protein [Bacillus halodurans] 
Identities = 119/250 (47%) , Positives = 149/250 (59%) , Gaps = 9/250 (3%) 



Query: 


1 


MQQYFWGE--AGAYVTIEDKDTIKHMFNVMRLTEDDQVVLVFDDAIKRIAKVVDSSAHR 


58 






MQ+YFV E YVTI D +KH+ VMR+T D+ L+ D R + , A+ 




Sbjct: 


1 


MQRYWPKEQMTDTYVTITGDD-VKHIIKVMRMTIGDE--LICSDGHGRTVRCEIEKAND 


57 


Query: 


59 


FQIL EELDNNvEMPVQvTIASGFPKGDKLDFVTQKATELGAARIWGFPADWSVVKW 


114 






++L E L N E+P++VTIA PKGDKLD++ QK TELGA A W F A S+VKW 




Sbjct: 


58 


SEVLARVIEPLIPNTELPIRVTIAQALPKGDKLDYIVQKGTELGAQAFWPFSASRSIVKW 


117 


Query: 


115 


DGKKIJUCKEDKLAKIALGAAEQSKRNRLPQVIUjFEKKADFQAELAGFDKIFIAYEESAKE 


174 






D KK KK ++L KIA AAEQS R R+P + + E++GF K +AYEE AKE 




Sb j ct : 


118 


DEKKGRKKTERLMKIAKEAAEQSYRERIPSIETPLAFSKLLQEISGFTKTIVAYEEEAKE 


177 


Query: 


175 


GELSALAQNLQTVKAGDKLLFI FGPEGGISPKEIAAFEEVGAI KVGLGPRIMRTETAPLY 


234 






G L A L + GD LL I GPEGG + +EI A + G GLGPRI+RTETA LY 




Sb j ct : 


178 


GRLMTFAACLNELHHGDSLIiVIIGPEGGFTTEEIDAIQRAGGAPAGLGPRILRTETASLY 


237 


Query: 


235 


ALSVISYSAE 244 








AL+ ISY E 




Sbjct: 


238 


ALAAISYHFE 247 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1701> which encodes the amino acid 
sequence <SEQ ID 1702>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2274 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 173/245 (70%) , Positives = 202/245 (81%) 

Query: 1 MQQYFWGEAGAYVTIEDKDTIKHMFNVMRLTEDI^ 60 

MQQYF+ G+A VTI DKDTIKHMF VMRL ++ +WLVFDD +K LAKV +S AH + 
Sbjct: 1 MQQYFIKGKAEKKVTITDKDTIKHMFQVMRLADEAEVVTi^^ 60 

Query: 61 ILEELDNNVEMPVQVTIASGFPKGDKLDFVTQKATELGAAAIWGFPADWSVWiroGKKljA 120 

I+E L + VE+PV+VTIASGFPKGDKLD + QK TELGA+A+WG+ PADWS WKWDGKKLA 
Sbjct: 61 IIF^ALPDQVELPVKVTIASGFPKGDKLDTIAQKVTELGASALWGYPADWSWKMDGKKLA 120 

Query: 121 KKEDKIAKIALGAAEQSKRNRLPQVRLFEKKADFQAELAGFDKIFIAYEESAKEGELSAL 180 
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KKEDKLAKI LGAAEQSKRNR+P+V LFE KA+F L+ FD IFIAYEE+AK G+L+ L 
Sbjct: 121 KKEDKLAKIVLGAAEQSKRNRVPEVHLFEHKREFLKSLSSFDHIFIAYEETAKAGQIATL 180 

Query: 181 AQMLQTVKAGDKLLFIFGPEGGISPKEIAAFEEVGAIKVGLGPRIMRTETAPLYALSVIS 240 
5 A+ ++ VK G K+LFIFGPEGGISP EI FE AIKVGLGPRIMR ETAPLYALS +S 

Sbjct: 181 AREVKEVKPGAKILFIFGPEGGISPTEITQFEA^SAIKVGLGPRIMRAETAPLYALSALS 240 

Query: 241 YSAEL 245 
Y+ EL 

10 Sbjct: 241 YALEL 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 533 

15 A DNA sequence (GBSx0571) was identified in S.agalactiae <SEQ ID 1703> which encodes the amino 
acid sequence <SEQ ID 1704>. Analysis of this protein sequence reveals the following: 



20 



25 



40 



45 



50 



Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane 238 - 254 ( 237 - 254) 



Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA82791 GB:AB023064 orf35 [Listeria monocytogenes] 
Identities = 138/309 (44%) , Positives = 193/309 (61%) , Gaps = 5/309 (1%) 

30 Query: 4 WNELTVHVNREAEEAVSNLLIETGSQGVAISDSADYLGQ-EDRFGELYP EVEQSDMI 59 

W+E+ VH EA E V+N+L E G+ GV+I D AD+L + ED+FGE+Y E D + 
WSEVEVHTTNEAVEPVANVLTEFGAAGVSIEDVADFLREREDKFGEIYALRREDYPEDGV 62 

AITAYYPDTLDIEAVKADLADRLANFEGFGIiATGSVNLDSQELVEEDWADNWKKYYEPAR 119 
35 I AY+ T+ ++LNF+G ++ +E+WA WKKYY P + 

IIKAYFLKTTEFVEQIPEIEQTLKNLSTFDIPLGKFQFVVNDVDDEEWATAWKKYYHPVQ 122 



Query: 


4 


Sb j ct : 


3 


Query: 


60 


Sbjct: 


63 


Query: 


120 


Sb j ct : 


123 


Query: 


180 


Sbjct: 


183 


Query: 


239 


Sbjct: 


243 


Query: 


299 


Sbjct: 


303 



IT +TIVPSW Y A E II++DPGMAFGTGTHPTT++ + AL L+ G+ VIDVG 



TGSGVLS IAS+ LGAK I A DLD++A R A+ENI +N IV +LL+ + + V 



D++VANILA++++ +D Y+ +K G I SGII +K +V E+ + AG +E QG 



+W A + K+ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1705> which encodes the amino acid 
55 sequence <SEQ ID 1706>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.57 Transmembrane 238 - 254 ( 237 - 257) 



60 



Final Results 
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bacterial membrane Certainty=0 .2826 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA82791 GB.-AB023064 orf35 [Listeria monocytogenes] 
Identities = 139/309 (44%) , Positives = 203/309 (64%) , Gaps = 5/309 (1%) 

Query: 4 WQEVTVHVHRDAQEAVSHVLIETGSQGVAIADSADYIGQK-DRFGELYP DVEQSDMI 59 

W EV VH +A E V++VL E G+ GV+I D AD++ ++ D+FGE+Y + D + 
Sbjct: 3 WSEVEVHTTNEAVEPVANVLTEFGAAGVSIEDVADFLREREDKFGEIYALRREDYPEDGV 62 

Query: 60 AITAYYPSSTNLADIIATINEQLAEIASFGLQVGQVTVDSQEL^ 119 

I AY+ +T + I I + L L++F + +G+ ++ +E+WA WKKYY P + 

Sbjct: 63 IIKAYFLKTTEFVEQIPEIEQTLKNLSTFDIPMKFQFVVNDVDDEEWATAWKKYYHPVQ 122 

Query: 120 ITHDLTIVPSWTDYDASAGEKVIKLDPGMAFGTGTHPTTKMSLFALEQILRGGETVIDVG 179 

IT +TIVPSW Y SA E +I+LDPGMAFGTGTHPTT++ + AL L+ G+ VIDVG 
Sbjct: 123 ITDRITIVPSWESYTPSANEIIIELDPGMAFGTGTHPTTQLCIRALSNYLQPGDEVIDVG 182 

Query: 180 TGSGVLSIASSLLGAKTIYAYDLDDVAVRVAQDNIDLNQGTDNIHVAAGDLLKGVSQ-EA 238 

TGSGVLS IAS+ LGAK+I A DLD++A R A++NI LN+ I V +LL+ +++ 
Sbjct: 183 TGSGVLS IASAKLGAKSILATDLDEIATRAAEENITLNKTEHIIOTKQNNLLQDINKTNV 242 

Query: 239 DVIVANILADILvLLTDDAYRLVKKEGYLILSGIISEKLDMVLEAAFSAGFFLETHMVQG 298 

D++VANILA++++L +D Y+ +K G I SGII +K +V EA +AG +E QG 
Sbjct: 243 DIWANIIAEVILLFPEDWKALKPGGVFIASGIIEDKAKVVEEALKNAGLIIEKMEQQG 302 

Query: 299 EWNALVFKK 307 

+W A++ K+ 
Sbjct: 303 DWVAIISKR 311 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 259/317 (81%) , Positives = 287/317 (89%) 

Query: 1 MNTWlffiLTVHVNREAEEAVSNLLIETGSQGVAISDSADYLGQEDRFGELYPEVEQSDMIA 60 

M TW E+TVHV+R+A+EAVS++LIETGSQGVAI+DSADY+GQ+DRFGELYP+VEQSDMIA 
Sbjct: 1 METWQEVTVHVHRDAQEAVSHVLIETGSQGVAIADSADYIGQKDRFGELYPDVEQSDMIA 60 

Query: 61 ITAYYPDTLDIEAWADLADRLANFEGFGLATGSVOTjDSQELVEEDWADNWKKYYEPARI 120 

ITAYYP + ++ + A + ++LA FGL G V +DSQEL EEDWADNWKKYYEPARI 
Sbjct: 61 ITAYYPSSTNLADIIATINEQLAEIASFGLQVGQVTvDSQELAEEDWADNWKKYYEPARI 120 

Query: 121 THDLTIVPSWTDYEAKAGEKIIKMDPGMAFGTGTHPTTKMSLFALEQVLRGGETVIDVGT 180 

THDLTIVPSWTDY+A AGEK+IK+DPGMAFGTGTHPTTKMSLFALEQ+LRGGETVIDVGT 
Sbjct: 121 THDLTIVPSWTDYDASAGEKVIKLDPGMAFGTGTHPTTKMSLFALEQILRGGETVIDVGT 180 

Query: 181 GSGVLSIASSLLGAKDIYAYDLDDVAVRVAQENIDMNPGTENIHVAAGDLLKGVQQEVDV 240 

GSGVLSIASSLLGAK IYAYDLDDVAVRVAQ+NID+N GT+NIHVAAGDLLKGV QE DV 
Sbjct: 181 GSGVLSIASSLLGAKTIYAYDLDDVAVRVAQDNIDIiNQGTDNIHVAAGDLLKGVSQEADV 240 

Query: 241 IVANILADILIHLTDDAYRLVKDEGYLIMSGIISEKWDMVRESAEKAGFFLETHMVQGEW 300 

IVANILADIL+ LTDDAYRLVK EGYLI+SGIISEK DMV E+A AGFFLETHMVQGEW 
Sbjct: 241 IVANILADILVLLTDDAYRLVKKEGYLILSGIISEKLDMVLEAAFSAGFFLETHMVQGEW 300 

Query: 301 NACVFKKTDDISGVIGG 317 

NA VFKKTDDISGVIGG 
Sbjct: 301 NALVFKKTDDISGVIGG 317 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 534 

A DNA sequence (GBSx0572) was identified in S.agalactiae <SEQ ID 1707> which encodes the amino 
acid sequence <SEQ ID 1708>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4198 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
15 vaccines or diagnostics. 

Example 535 

A DNA sequence (GBSx0573) was identified in S.agalactiae <SEQ ID 1709> which encodes the amino 

acid sequence <SEQ ID 1710>. This protein is predicted to be transcriptional activator tipa. Analysis of this 

protein sequence reveals the following: 

20 Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0683 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15677 GB:Z99122 transcriptional regulator [Bacillus subtilis] 
30 Identities = 87/246 (35%) , Positives = 139/246 (56%) , Gaps = 13/246 (5%) 



35 



Query: 4 VKEVSILSGVSVRTLHHYDKIGLFPPTALSEAGYRLYDDEALIRLQEILLFRELEFPLKD 63 

VK+V+ +SGVS+RTLHHYD I L P+AL++AGYRLY D L RLQ+IL F+E+ F L + 
Sbj Ct : 5 VKQVAEISGVSIRTLHHYDNIELLNPSALTDAGYRLYSDADLERLQQILFFKEIGFRLDE 64 

Query: 64 IKYLLEQAKEERQDLLAQQIKLLEWKRSHLEQVITHAKR--LQEKGDDYMN FDVYN 117 

IK +L+ +R+ L Q ++L K+ ++++I R L G + MN F + 
Sbjct: 65 IKEMLDHPNFDRKAALQSQKEILMKKKQRMDEMIQTIDRTLLSVDGGETMNKRDLFAGLS 124 

40 Query: 118 KTELEQLQA EAKEKWGQTAA--YKEFAQKHASDDFAQISQEMAKIMVQFGQLKTQN 171 

++E+ Q E ++ +G+ A ++ +++DD+ IE I + 

Sbjct: 125 MKDIEEHQQTYADEVRKLYGKEIAEETEKRTSAYSADDWRTIMAEFDSIYRRIAARMKHG 184 

Query: 172 VSDESVQMCVKRLQDYISQNFYTCTNEILAGLGQMYQSDDRFSQSIDKAGGAGTSEFVSQ 231 
45 D +Q V +D+I Q Y CT +1 GLG++Y +D+RF+ SI++ G G + F+ + 

Sbjct: 185 PDDAEIQAAVGAFRDHICQYHYDCTLDIFRGLGEVYITDERFTDSINQY-GEGIAAFLRE 243 

Query: 232 AIAYYC 237 
AI YC 

50 Sbjct: 244 AIIIYC 249 

A related DNA sequence was identified in S.pyogenes <SEQ ID 171 1> which encodes the amino acid 
sequence <SEQ ID 1712>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
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>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.28 Transmembrane 146 - 162 ( 143 - 167) 
INTEGRAL Likelihood = -2.92 Transmembrane 172 - 188 ( 171 - 190) 



5 Final Results 

bacterial membrane Certainty=0 .4312 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:CAB15677 GB:Z99122 transcriptional regulator [Bacillus subtilis] 
Identities = 40/107 (37%) , Positives = 69/107 (64%) , Gaps = 6/107 (5%) 

Query: 7 YSTGEIANLAGVSIRTVQYYDQRGILIPTALTAGGRRLYTDSDLEQLRMICFLRDLGFSI 66 
15 Y ++A ++GVSIRT+ +YD +L P+ALT G RLY+D+DLE+L+ I F +++GF + 

Sbjct: 3 YQVKQVAEISGVS1RTLHHYDNIELLNPSALTDAGYRLYSDADLERLQQILFFKEIGFRL 62 

Query: 67 EQIRKVLAEENAAQVLELLLVDHIATAKEDIAAKEQQVDIAVKILDR 113 
++I+++L N + L + KE L K+Q++D ++ +DR 

20 Sbjct: 63 DEIKEMLDHPNFDRKAAL QSQKEILMKKKQRMDEMIQTIDR 103 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 40/133 (30%) , Positives = 71/133 (53%) , Gaps = 6/133 (4%) 

25 Query: 6 EVSILSGVSVRTLHHYDKIGLFPPTALSEAGYRLYDDEALIRLQEILLFRELEFPLKDIK 65 

E++ L+GVS+RT+ +YD+ G+ PTAL+ G RLY D L +L+ I R+L F ++ 1+ 
Sbjct: 11 ELANLAGVSIRTVQYYDQRGILIPTALTAGGRRLYTDSDLEQLRMICFLRDLGFSIEQIR 70 

Query: 66 YLL- -EQAKEERQDLLAQQIKL LEWKRSHLEQVITHAKRLQEKGDDYMNFDVYNKT 119 

30 +L E A + + LL I L K ++ + RL+++ ++F + 

Sbjct: 71 KVIAEENAAQVLELLLVDHIATAKEDIAAKEQQVDIAvKILDRLRKQDPQSLDFLMDISL 130 

Query: 120 ELEQLQAEAKEKW 132 
++ +A K +W 
35 Sbjct: 131 SMKNQKAWKKLQW 143 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 536 

40 A DNA sequence (GBSx0575) was identified in S.agalactiae <SEQ ID 1713> which encodes the amino 
acid sequence <SEQ ID 1714>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 57 - 73 ( 57 - 73) 

45 



Final Results 

bacterial membrane Certainty=0 . 1022 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14586 GB:Z99117 yrkN [Bacillus subtilis] 
Identities = 38/136 (27%) , Positives = 60/136 (43%) , Gaps = 3/136 (2%) 

55 Query: 2 ITLQKAEASDLEKIIA-IQRASFKAVYEKYHDQYDPYVEEVEQIRWKLVERPDCFYHFVL 60 

+ L+ A+ SDL + +Q A AV E + D D + ++ + P + +L 

Sbjct: 9 VILELAKESDLPEFQKKLQEAFAIAVIETFGDCEDGPIPSDNDVQ-ESFNAPGAVVYHIL 67 



Query: 61 VDETIVGFLRLVIKDEEKRAWLGTAAILPQYOj3Q^YGSAAMALLEKTYPKLTKWDLCTIA 120 
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D VG +1 + L + P+Y QG G +A +E YP W+ T 

Sbjct: 68 QDGKIWGGAVVRINSQTMmSLDLFYVSPEYHSQGIGLSAWKAIEAQYPDTVLWETVTPY 127 

Query: 121 QEKLMVSFY-EKCGYH 135 
5 EK ++FY KCG+H 

Sbjct: 128 FEKRNINFYVNKCGFH 143 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 537 

A DNA sequence (GBSx0576) was identified in S.agalactiae <SEQ ID 1715> which encodes the amino 
acid sequence <SEQ ID 1716>. This protein is predicted to be Bacterial mutT protein. Analysis of this 
protein sequence reveals the following: 

15 Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2417 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG06568 GB:AE004742 hypothetical protein [Pseudomonas aeruginosa] 
25 Identities = 57/131 (43%) , Positives = 82/131 (62%) 

Query: 10 FSG^IALFCEGKILTSLRDDFPDIiPYAGFWDLPGGGREDNETPLECLFREVDEELSLTL 69 

FSGAK+ALF ++ RD+ P +P+ G+VJD PGGGRE ETP EC RE++EE S+ L 
Sbjct: 7 FSGAKLALFYGDHLWYKRDEKPGIPFPGYWDFPGGGREGLETPAECALRELEEEFSIRL 66 

30 

Query: 70 TRNHIDtWKTYRGMLKPDKLSVFMVGHISQKEYDSIVLGDEGQDYKLMSIDEFLSHKKVI 129 

I+W + Y + F+V + +E+++I GDEGQ ++LM +D +L+H + 

Sbjct: 67 EEPRIEWQRQYPSTSGSAPFAYFLVARLEDREFEAIRFGDEGQYWRLMEVDAYLAHflMAV 126 

35 Query: 130 PQLQERLRDYL 140 

P LQ RL DYL 
Sbjct: 127 PYLQSRLGDYL 137 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 538 

A DNA sequence (GBSx0577) was identified in S.agalactiae <SEQ ID 1717> which encodes the amino 
acid sequence <SEQ ID 171 8>. Analysis of this protein sequence reveals the following: 

45 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3299 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1719> which encodes the amino acid 
sequence <SEQ ID 1720>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5527 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/156 (71%) , Positives = 128/156 (81%) 

Query: 1 MAKFGFLSVLEEELDKHLQYDFAMDWDKKNHTVEVTFILEAQNSSAIETVDDQGETSSED 60 
15 MA +GFLSVLEEE+DKH QYD+AMDWDKKNH VEVTF+LEAQN AI+T+DD GE + +D 

Sbjct: 1 MATYGFLSvLEEEMDKHFQyDYAMDVTOKKNHAVEVTFVLEAQNKEAIKTIDDSGEVTQDD 60 

Query: 61 IVFEDYVLFYNPVKSRFDAEDYLVTIPYEPKKGLSREFLAYFAETIiNEVATEGLSDLMDF 120 
IVFEDYVLFYNP KS+FDA DYLVTIP++ KKG SREFLAYFA+ LN+VA EG SDLMDF 
20 Sbjct: 61 IVFEDYVLFYNPAKSQFDAADYLVTIPFDAKKGFSREFLAYFAQFLNDVAIEGHSDLMDF 120 

Query: 121 LTDDS IEEFGLSWDTDAFENGRAELKETEFYPYPRY 156 
' L DDS +F L W+ AFE G+ L+E YPYPRY 

Sbjct: 121 LADDSKADFFLEWNAQAFEEGQQGLEEAASYPYPRY 156 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 539 

A DNA sequence (GBSx0578) was identified in S.agalactiae <SEQ ID 1721> which encodes the amino 
30 acid sequence <SEQ ID 1722>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 2846 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:CAB51273 GB:AL096872 putative acetyltransf erase [Streptomyces 

coelicolor A3 (2) ] 

Identities = 35/109 (32%) , Positives = 62/109 (56%) , Gaps = 1/109 (0%) 

Query: 51 VAEVDDKIAGVLDFGPYYPFPAGKHVATF-GILIAEPYQGQGLGKALLKALLTEAKAQGY 109 
45 VAE+D + G + G P + HV G+ +A +G G+G+AL++A + EA+ +G+ 

Sbjct: 56 VAELDGAWGYTOLGFPTPLASNTHTOQIRGLAVAGAARGHGVGRALVRAAVEEARHEGF 115 

Query: 110 I KIAMHVMGNNSRAI SLYQKYGFTEEARITKAFFIENHYVDAL I FAKDL 158 
+1 + V+G+N+ A LY+ GF E + F ++ YVD ++ + h 
50 Sbjct: 116 RRITLRVLGHNTAARGLYESEGFWEGVQPEEFHLDGRYVDDVLMGQML 164 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1723> which encodes the amino acid 
sequence <SEQ ID 1724>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
55 >» Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm Certainty=0. 0229 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 34/108 (31%), Positives = 59/108 (54%), Gaps = 7/108 (6%) 

Query: 35 TESDLEKNLANGMSFFV AEVDDKIAGVLDFGPYYPFPAGKHVATFGILIAEPYQG 89 

T +L L+ + F+ A +D+K+ G+L+ G+ A +L+A+ Y+G 

10 Sbjct: 43 TPQELSDFLSRSQTSFIDFCLLARLDEKVVGLIiNLSGEV-LSQGQAEADVFMLVAKTYRG 101 

Query: 90 QGLGKAIjLKALLTEAKAQGYIK-IAMHVMGNNSRAISLYQKYGFTEEA 136 

G+G+ LL+ L A+ YI+ + + V N++AI LY+KYGF E+ 
Sbjct: 102 YGIGQLLLEIALDWAEENPYIESLKLDVQVRNTKAIYLYKKYGFRIES 149 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 540 

A DNA sequence (GBSx0579) was identified in S.agalactiae <SEQ ID 1725> which encodes the amino 
20 acid sequence <SEQ ID 1726>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 2056 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:CAB14712 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 248/417 (59%), Positives = 314/417 (74%), Gaps = 4/417 (0%) 

Query: 5 LALRMRPRNINE VTGQQHL VGNGKI IDRMVAANMLSSMILYGPPGIGKTS IASAIAGTTK 64 
IA RMRP I ++IGQQHLV KII RMV A LSSMILYGPPGIGKTSIA+AIAG+T 
35 Sbjct: 4 LAYRMRPTKIEDIIGQQHLVAEDKIIGRMVQAKHIiSSMILYGPPGIGKTSIATAIAGSTS 63 

Query: 65 YAFRTFNATVDSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENGNIIMIG 124 

AFR NA +++KK ++ +A+EAK SG ++L+LDE+HRLDK KQDFLLP LENG II+IG 
Sbjct: 64 IAFRKLNAVINNKKDMEIVAQEAKMSGQVILILDEVHRLDKGKQDFLLPYLENGMIILIG 123 

40 

Query: 125 ATTENPFFSVTPAIRSRVQIFELEPLSNEDIKKAIQLAISDKERGF-PFLVTIDDEALDF 183 

ATT NP+ ++ PAIRSR QIFELEPL+ E IK+A++ A+ D+ RG + V+IDD+A++ 
Sbjct: 124 ATTANPYHAINPAIRSRTQIFELEPLTPELIKQALERALHDEHRGLGTYSVSIDDQAMEH 183 

45 Query: 184 IVTATNGDLRSAYNSLDLAVMSTSPNEDGSRHISLETMENSLQCSYITMDKNGDGHYDIL 243 

GD+RSA N+L+LAV+ST + DG HI+LET E LQ + DK+GD HYD+L 
Sbjct: 184 FAHGCGGDWSALNALELAVLSTKESADGEIHITI^TAEECLQKKSFSHDKDGDAHYDVL 243 

Query: 244 SALQKS IRGSD VNASLHYAARLVEAGDLPSLARRLTI IAYEDIGLANPEAQIHTVTALEA 303 
50 SA QKSIRGSD NA+LHY ARL+EAGDL S+ARRL +IAYEDIGLA+ P+A + A++ 

Sbjct: 244 SAFQKSIRGSDANAALHYIARLIFAGDLESIARRLLVIAYEDIGIASPQAGPRVLNAIQT 303 

Query: 304 AQRIGFPEARILIANIVVDLALSPKSNSAYIAMDAALADLRRSGNLPIPRHLRDGHYSGS 363 
A+R+GFPEARI +AN V++L LSPKSNSA LA+D ALAD+R +P+HL+D HY G+ 

55 Sbjct: 304 AERVGFPEARIPLANAVIELCLSPKSNSAIIAIDEAIJ^IRAGKIGDVPKHIjKDAHYKGA 363 

Query: 364 KTLGNARDYKYPHA.YPEKWVKQQYLPDKLVGHNYFESUSIETGKYERALGSNKERIDKL 420 

+ LG DYKYPH Y WV+QQYLPD L Y++ +TGK+E AL K+ DKL 
Sbjct: 364 QELGRGIDYKYPHNYDNGWVEQQYLPDPLKNKQYYKPKQTGKFESAL KQVYDKL 417 

60 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1727> which encodes the amino acid 
sequence <SEQ ID 1728>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2374 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^ . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 394/422 (93%) , Positives = 409/422 (96%) 

Query: 1 MADNLALRMRPRNINEVIGQQHLVGNGKIIDRMVAANMLSSMILYGPPGIGKTSIASAIA 60 
15 M D+LALRMRP+ I+EVIGQ+HLVG GKII RMV AN LSSMILYGPPGIGKTSIASAIA 

Sbjct: 1 MPDHLALRMRPKTISEVIGQKHLVGEGKIIRRMVEANRLSSMILYGPPGIGKTSIASAIA 60 

Query: 61 GTTKYAFRTFNATVDSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENGNI 120 
GTT+YAFRTFNAT+DSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENG I 
20 Sbjct: 61 GTTRYAFRTFNATIDSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENGTI 120 

Query: 121 IMIGATTENPFFSVTPAIRSRVQIFELEPLSNEDIKKAIQLAISDKERGFPFLVTIDDEA 180 

IMIGATTENPFFSVTPAIRSRVQIFELEPLSNEDIK AIQLAISDKERGFPFLVTIDDEA 
Sbjct: 121 IMIGATTENPFFSVTPAIRSRVQI FELEPLSNEDIKTAIQLAI SDKERGFPFLVTIDDEA 180 

25 

Query: 181 LDFIVTATNGDLRSAYNSLDIAVMSTSPNEDGSRHISLETMENSLQCSYITMDKNGDGHY 240 

LDFIVTATNGDLRSAYNSLDLAVMSTSPNEDGSRHISLETMENSLQ SYITMDKNGDGHY 
Sbjct: 181 LDFIVTATNGDLRSAVNSIjDLAVMSTSPNEDGSRHISLETMENSLQRSYITMDKNGDGH 240 

30 Query: 241 DILSALQKSIRGSDVNASLHYAftRLVEAGDLPSLARRLTIIAYEDIGLANPEAQIHTVTA 300 

D+LSALQKSIRGSDWASLHYAARLVEAGDLPSIARRLTIIAYEDIGIJ^P+AQ+HTVTA 
Sbjct: 241 DVLSALQKSIRGSDvNASLHYAARLVEAGDLPSLftRRLTIIAYEDI^ 300 

Query: 301 LEAAQRIGFPEARILIANIVVDLALSPKSNSAYL1AMDAALADLRRSGNI1PIPRHLRDGHY 360 
35 L+AAQRIGFPEARI IAN+V+DLALSPKSNSAYLAMDAALADLR SGNLPIPRHLRDGHY 

Sbjct: 301 LDAAQRIGFPFARIPIANWIDIALSPKSNSAYDAMDAALADLRTSGNLPIPRHLRDGHY 360 

Query: 361 SGSKTLGNARDYKYPHAYPEKWVKQQYLPDKLVGHNYFEANETGKYERALGSNKERIDKL 420 
+GSK LGNA+DY YPHAYPEKWVKQQYLPDKLVGH+YFEANETGKYERALGSNKERIDKL 
40 Sbjct: 361 AGSKDLGNAKDYLYPHAYPEKWVKQQYLPDKLVGHHYFEANETGKYERALGSNKERIDKL 420 

Query: 421 SD 422 
SD 

Sbjct: 421 SD 422 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 541 

A DNA sequence (GBSx0580) was identified in S.agalactiae <SEQ ID 1729> which encodes the amino 
50 acid sequence <SEQ ID 1730>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 2991 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10207> which encodes amino acid sequence <SEQ ID 
10208> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 542 

A DNA sequence (GBSx0581) was identified in S.agalactiae <SEQ ID 1731> which encodes the amino 
acid sequence <SEQ ID 1732>. Analysis of this protein sequence reveals the following: 

10 Possible site: 29 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2402 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 543 

A DNA sequence (GBSx0582) was identified in S.agalactiae <SEQ ID 1733> which encodes the amino 
acid sequence <SEQ ID 1734>. Analysis of this protein sequence reveals the following: 

25 Possible site: 49 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

35 bacterial membrane Certainty=0 . 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:CAB15891 GB:Z99123 yxlG [Bacillus subtilis] 

Identities = 54/203 (26%) , Positives = 100/203 (48%) , Gaps = 7/203 (3%) 

Query: 1 MTGLIPMLKKEWLENSRSHKALALLLISIIFGILGPLTALLMPEIMA- -GILPKKLQEAI 58 
M ++ +L+KEWLE +S K + L + +1 G+ PLT MPEI+A G LP ++ + 
45 Sbjct: 1 MKVMMALLQKEWLEGWKSGKLIWLPIAMMIVGLTQPLTIYYMPEIIAHGGNLPDGMKISF 60 

Query: 59 PDPTYLDSYSQYFKNINQLGLILLVFLFSGSLTQEFTRGTLINLITKGLSKKAIILAKFI 118 

P+ + N LG+ L++F GS+ E +G ++++ ++ I++K++ 

Sbjct: 61 TMPSGSEVMVSTLSQFNTLGMALVIFSVMGSVANERNQGVTALIMSRPVTAAHYIVSKWL 120 

50 
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Query: 119 MMTLIWSISYILGSLTQYAYTLYYFNNHGQHKLIV-YGTSWIFGLLLLSLILFYSVIFRK 177 

+ ++I +S+ G YY F+ + G++ + +++ L S IFR • 

Sbjct: 121 IQSVIGIMSFAAGYGLAYYYVRLLFEDASFSRFAASLGLYALWVIFIVTAGLAGSTIFR- 179 

5 Query: 178 TAGVLIAC LMTIVAFFISGF 197 

+ G AC L V+F + F 
Sbjct: 180 SVGAAAACGIGLTAAVSFAVHYF 202 

No corresponding DNA sequence was identified in S.pyogenes. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 544 

A DNA sequence (GBSx0583) was identified in S.agalactiae <SEQ ID 1735> which encodes the amino 
acid sequence <SEQ ID 1736>. This protein is predicted to be ABC transporter, ATP-binding protein. 
15 Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1344 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

25 >GP:CAB15892 GB:Z99123 similar to ABC transporter (ATP-binding 

protein) [Bacillus subtilis] 
Identities = 116/303 (38%) , Positives = 175/303 (57%) , Gaps = 18/303 (5%) 

Query: 4 ISLQNLSKSFGDQIILNQVSLELEENKIYGFVGPNGAGKTTTIKMILGLLKVDSGTISVM 63 
30 +S+++L KS+ + VS + EN+ +GPNGAGKTTT++M+ GLL SGTI ++ 

Sbjct: 2 LS 1 ESLCKSYRHHEAVKNVSFHVNENECVALLGPNGAGKTTTLQMLAGLLS PTSGT I KLL 61 

Query: 64 GNPVTFGQTKSNQVIGYLPDVPEFYDYMTAQEYLQLC AGLAQNKTSLPIADLLEQVG 120 

G + ++IGYLP P FY +MTA E+L +GL++ K I ++LE VG 

35 Sbjct: 62 GE KKLDRRLIGYLPQYPAFYSWMTANEFLTFAGRLSGLSKRKCQEKIGEMLEFVG 116 

Query: 121 LADN-QQRISTYSRGMKQRLGLAQALIHNPKILICDEPTSALDPQGRQEILSIISQLRGQ 179 

L + +RI YS GMKQRLGLAQAL+H PK LI DEP SALDP GR E+L ++ +L+ 
Sbjct: 117 LHEAAHKRIGGYSGGMKQRLGLAQALLHKPKFLILDEPVSALDPTGRFEVLDMMRELKKH 176 

40 

Query: 180 KTVIFSTHILSDVEKVCDQVLILTKSGIH NLEDLRDKASASVNQLNLLIKVSDNEAQ 236 

V+FSTH+L D E+VCDQV+I+ I L++L+ + +V L++ K+ + 

Sbjct: 177 MAVLFSTHVLHDAEQVCDQWIMKNGEISWKGELQELKQQQQTNVFTLSVKEKLEGWLEE 236 

45 Query: 237 KIALRFPLNQKDQYYKVHLELSFANNREQAIASFYRYLVEQEITPYFIELLEDSLEDFYL 296 
K+++ +EL++ L+ +++T E +SLED YL 

Sbjct: 237 KPYVSAIVYKNPS - -QAVFELPDIHAGRSLLSD CIRKGLTVTRFEQKTESLEDVYL 290 

i 

Query: 297 EVI 299 
50 +V+ 

Sbjct: 291 KW 293 

There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 545 

A DNA sequence (GBSx0584) was identified in S.agalactiae <SEQ ID 1737> which encodes the amino 
acid sequence <SEQ ID 1738>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4383 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



15 



>GP:AAB71491 GB:U53767 ORF6 [Bacillus pumilus] 
Identities = 25/60 (41%) , Positives = 41/60 (67%) 

Query: 2 IGDTILFERTRLGMTQEKLSDYLHLTKATISKWENNQAKPDIDYLILMAKLFDMTLDELV 61 

+G I +R L ++QE +++ L +++ ISKWE NQ++P +D LI +A+LFD + ELV 
Sbjct: 4 LGSNISNKRKSLKLSQEYVAEQLGVSRQAISKWETNQSEPSMDNLIRLAELFDSDIKELV 63 



20 There is also homology to SEQ ID 1 740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 546 

A DNA sequence (GBSx0585) was identified in S.agalactiae <SEQ ID 1741> which encodes the amino 
25 acid sequence <SEQ ID 1742>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 .4241 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

35 >GP:CAB15470 GB:Z99121 yvdC [Bacillus subtilis] 

Identities = 59/104 (56%) , Positives = 76/104 (72%) 

Query: 1 MDITAYQKWVSEFYKKRNWYQYNSFIRSNFLCEEVGELAQAIRKYEIGRDRPDEIEKSNN 60 
M + +KW+ EFY+KR W +Y FIR FL EE GELA+A+R YEIGRDRPDE E S 
40 Sbjct: 1 MQLADAEKWMKEFYEKRGOTEYGPFIRVGFIJ1EEAGELARAVRAYEIGRDRPDEKESSRA 60 

Query: 61 ENLNDI KEELGDVLDNI F I LADQYNI SLEEI IEAHKNKLEKRFE 104 

E ++ EE+GDV+ NI ILAD Y +SLE++++AH+ KL KRFE 
Sbjct: 61 EQKQELIEEMGDVIGNIAILADMYGVSLEDVMKAHQEKLTKRFE 104 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 547 

50 A DNA sequence (GBSx0586) was identified in S.agalactiae <SEQ ID 1743> which encodes the amino 
acid sequence <SEQ ID 1744>. Analysis of this protein sequence reveals the following: 
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Possible site: 61 

>>> Seems to have no N- terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 0453 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

10 >GP:BAB06803 GB:AP001517 unknown conserved protein [Bacillus halodurans] 

Identities = 87/187 (46%) , Positives = 125/187 (66%) 

Query: 1 MKIWFCGASNGNNPIYSQKIVELGEWMIKNNHDLVYGGGKVGLMGVIADTVINNGGQAI 60 
MKI VFCG+SNG + +Y + +LG+ + + LVYGG VG+MG +AD+V+ GG+ I 
15 Sbjct: 1 MKIAVFCGSSNGASDVYKEGARQLGKELARRGITLVYGGASVGIMGAVADSVLFAGGEVI 60 

Query: 61 GVIPTFLKDREIAHTNLSKLIVVENMPQRKGKMMSLGEAYIALPGGPGTLEEISEVISWS 120 

GV+P FL++ EI+H +L+KLIWE M +RK KM L + ++ALPGGPGTLEE E+ +W+ 
Sbjct: 61 GVMPRFLEEPEISHPHLTKLIVVETMHERKAKMAELADGFLALPGGPGTLEEFFEIFTWA 120 

20 

Query: 121 RIGQNDSPCILYNINGYFNHLESMFDH^WSEGFLSQNDRNNVLFSDDIIEIEKFIKDYQS 180 

+IG + PC L NIN YF+ L ++ HM +E FL + R+ L D I + Y+ 
Sbjct: 121 QIGLHQKPCGLIiNINHYFDPLVTLIJIHMSNEQFLHEKYRSMALVHTDPILLLDQFSTYEP 180 

25 Query: 181 PTIRKYS 187 

PT++ YS 
Sbjct: 181 PTVKAYS 187 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 548 

A DNA sequence (GBSx0587) was identified in S.agalactiae <SEQ ID 1745> which encodes the amino 
acid sequence <SEQ ID 1746>. Analysis of this protein sequence reveals the following: 

35 Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5288 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, coidd be useful antigens for 
vaccines or diagnostics. 

Example 549 

A DNA sequence (GBSx0588) was identified in S.agalactiae <SEQ ID 1747> which encodes the amino 
acid sequence <SEQ ID 1748>. This protein is predicted to be integrase. Analysis of this protein sequence 
50 reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3685 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF12706 GB:AF066865 integrase [bacteriophage TPW22] 
Identities = 106/377 (28%) , Positives = 199/377 (52%) , Gaps = 31/377 (8%) 

10 Query: 4 ARYRRRGNQNLWAYEIREEGKTVAYNS GFKTKKLAEAEAEPILQKLRTGSIITKNI 59 

A +R+RG W + + + Y G+KTKK AEA A+ ++L S +1 

Sbjct: 2 ANFRKRGKT - - WQFRLSYKDNNGEYKKFEKGGYKTKKiaEAAADEAKKRLNNHSEFDNDI 59 

Query: 60 SLPELYQEWLDLKIMPSNRSDVTKKKYLSRKVTLEKLFGDKPISQIRPSEYQRIMNNYGQ 119 
15 SL + +++W + P + ++ T + Y ++K DKPI++I P+ YQ ++N 

Sbjct: 60 SLYDFFEKWAKVYKKP-HVTFATmTYKRTLNLIDKYIKDKPIAEITPTFYQAVLNKMSL 118 

Query: 120 RVSRNFLGRLNTGVKQSLQMAIADKVM1EDFTQNVELFSTVKSQDADSKYLHSEKAYLDL 179 
+ L + +K ++++A+ +KV+ E+F + S + ++ + KYLH+++ YL L 
20 Sbjct: 119 LYRQESLDKFYFQIKSAMKIAVHEKVISENFADFTKAKSKLAARPVEEKYLHADE-YLKL 177 

Query: 180 INAVKDKFNYKKSWPYIIYFLLKTGMRYGELIALTWEDIDFDKGIFKTYRRFN-SETSQ 238 

+ ++K Y + Y TGMR+ EL+ LTW +DFDK R ++ S T+ 

Sbjct: 178 LAI AEEKMEYTSY FACYLTAVTGMRFAELLGLTWSHVDFDKKE I S I QRTWDYS ITNN 234 

25 

Query: 239 FVPPKNKTSIRIVPVDNECLEILKNLKIEQNQSNKELGLQNTNNMVFQHFGYPNSVPSTN 298 

F KN++S R +P+ ++ +++LK K KE+N+V+ SN 
Sbjct: 235 FAETKNESSKRKIPISSKTIKLLKKYK KEYWHENKYDRVIYNL SNN 280 

30 Query: 299 GTNKVLRGIVQEI^IEPIITTKGARHTYGSFLWHRGYDLGIIAKILGHKDISMLIEVYGH 358 
G NK ++ ++ + P RH++ S+L ++G DL ++K+LGH+++++ ++VY H 

Sbjct: 281 GLNKTIK- VIAGRKVHP HSLRHSFASYLIYKGIDLLTVSKLLGHENLNVTLKVYAH 335 

Query: 359 TLEEKIQEEYNEIKQLW 375 
35 L+E QE + I++++ 

Sbjct: 336 QLKEMEQENNDVIRKIF 352 

There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 550 

A DNA sequence (GBSx0589) was identified in S.agalactiae <SEQ ID 1749> which encodes the amino 
acid sequence <SEQ ID 1750>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2710 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 551 

A DNA sequence (GBSx0590) was identified in S.agalactiae <SEQ ID 1751> which encodes the amino 
acid sequence <SEQ ID 1752>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2534 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



15 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA06248 GB:D29979 ORF3 [Bacillus stearothermophilus] 
Identities = 81/263 (30%) , Positives = 135/263 (50%) , Gaps = 14/263 (5%) 

Query: 65 MGVHVELKGQGCRQYEEFIEGNDNNWTSLVKRLI-DNNSNFTRLDIANDIFDESLNVQRL 123 

MG+HVE+ GQGCR +E NW L RL+ + N TRLD+A D F + h 

Sbjct: 1 MGIHVEMTGQGCRLFELH TSINWYELFYRLVYEYEVNITRLDVAVDDFKGYFKINTL 57 

20 Query: 124 YEYSKKGLCITTARHAEYHEKFVIDSGELVGETVVFGARGNQQWCIVYNKLMEQNGKLQTD 183 
+ K + + A + E VI+ GE +G T+ FGA + + + E+N ++ D 
Sbjct: 58 VKKLKDDEvTSRFKKARHIENIVIEGGETIGHTLYFGAPSSD IQVRFYEKNVQMGMD 114 

Query: 184 IDINSWVRAELRCWQEKANLIAHQL-NDMRPLASIYFEAINGHYRFVSPKARDKNKRRRE 242 
25 ID+ W R E++ ++A+++A + +D+ PL I + + +F + KA DKNK+R 

Sbjct: 115 IDV--WNRTEIQLRDDRAHWAQIIADDVLPLGEIVAGLLRNYIQFRTRKATDKNKKRWP 172 

Query: 243 SWWWQNYINTEEKTRLSITOEKPTLRQSEaOTDKQVSKTIAKVYMAKYEAYGIDQAEVF 302 
R+W N++ + R++ K ++ + WD QVSK+ +Y E ++ + F 
30 Sbjct: 173 LARFWLNFLGDVQPLRIAKQMPKTSIEKKYRWIDSQVSKSFFMIYYCLNE EEKQRF 228 

Query: 303 LQDLLRRGVEKFTDNDEKEIEQY 325 

+ D+L G K T D + I Q+ 
Sbjct: 229 IDDVLAEGASKLTKADLQVINQF 251 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 552 

40 A DNA sequence (GBSx0591) was identified in S.agalactiae <SEQ ID 1753> which encodes the amino 
acid sequence <SEQ ID 1754>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2700 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 553 

A DNA sequence (GBSx0592) was identified in S.agalactiae <SEQ ID 1755> which encodes the amino 
acid sequence <SEQ ID 1756>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3121 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1757> which encodes the amino acid 
sequence <SEQ ID 1758>. Analysis of this protein sequence reveals the following: 

15 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2913 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaxnty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 19/52 (36%) , Positives = 33/52 (62%) 

25 

Query: 8 FGPNLTRLRKERGISQVELSNQLQIGKQSISDYEKQKAFPTFANIJ3KIAEYF 59 

F NL L ++ I Q+++ N+L I K +1+ Y K ++ PT N+ K+A++F 
Sbjct: 15 FSTNLNMLMAKKNIKQIDIHNKLGIPKSTITGYVKGRSLPTAGNVQKLADFF 66 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 554 

A DNA sequence (GBSx0593) was identified in S.agalactiae <SEQ ID 1759> which encodes the amino 
acid sequence <SEQ ID 1760>. Analysis of this protein sequence reveals the following: 

35 Possible site: 54 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98584 GB:L44593 ORF536; putative [Lactococcus phage BK5-T] 
45 Identities = 248/532 (46%) , Positives = 359/532 (66%) , Gaps = 16/532 (3%) 



50 



Query: 1 MNFIEQISENNQFPIIFVGSGITQRYFENAPTWEKLLKDIWLELFDEESyYAK- -AFELR 58 

MNFIE I +NNQFPIIFVGSG+T+RYF+N WE+LL ++W + +E+++Y + FE 
Sbjct: 1 MNFIENIKDNNQFPIIFVGSGVTKRYFKNGLKVffiQLLLELWNLVEEEKAFYTQYHVFENL 60 

Query: 59 ERFEN NDFDIYTNLASLLEKEVSKAFINGNIQVDNLDLKTAYELNISPFKQLvAN 113 

+ +N +F+I +A +LE++++ AF + + +DNL L A+ +ISPF+Q +AN 

Sbjct: 61 LKSKNLSKSDKEFEINLMMAGILEEKINNAFYSDELNIDNLTLAQAHTEHISPFRQCIAN 120 



55 



Query: 114 RFSNLKIREEKIEEIKQFSQMLSKARIIITTNYDNFIEECLKTINVSVKINVGNKGLFLK 173 
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FSNL ++ EE I FS+ML KAR I+TTNYDNFIEEC NVS+K+NVGN GLF+K 



Sbjct: 


121 


TFSNLDRKKGFDEE 1 1 SFSKMLVKARF I vTTNYDNFIEECFSKRNVS I KVNVGNSGLFVK 


180 


Query: 


174 


SSDYGELYKIHGTVDDASTITITKEDYEK1IVTKSALINAKILSNLVESPILFLGYSLTDE 


233 






S+DYGELYKIHG+V + +TI IT EDY+ N +K AL+NAKILSNL ESPILF+GYSLTD+ 




Sb j ct : 


181 


SNDYGELYKIHGSVKNPNTICITSEDYKNNESKIALVNAKILSNIjTESPILFIGYSLTDK 


240 


Query: 


234 


NIRKLLTDFAENSPFDISESAQKIGVVEYLPDSESIETWSSLPDLSVYYSCLKTDNFTN 


293 






NIR+LLT ++EN P++ISE+A +IGWEY PD 1+ +VS++PDL ++Y+ + TDN+ 




Sb j ct : 


241 


NIRELLTSYSENLPYEISEAAAR1GWEYTPDKIEIQDIVSNIPDLGIHYTKISTDNYKK 


300 


Query: 


294 


IYRLISKINQGFLPSEIAKYENVFRKIIEVKGESKDLKTVLTSYEDLANLTEDEIRSKNI 


353 






IY IS+I QG+LPSE IAK+E FRKIIEWG+ K+Ii TVLTS+ D++ + +E+++KNI 




Sb j Ct : 


301 


IYDEISQIEQGYLPSEIAKFEGAFRKIIEVKGKEKELDTVLTSFIDISKINTEELKNKNI 


360 


Query: 


354 


WAFGDERYIYKFPDFKEYVRSYFLDKETIPQEIVIRFIATQPVASHLPIKKYMFAMSEY 


413 






WAFGD +YIYK P +K+Y+R YF + + I + F+ + +P KK+M + + 




Sb j ct : 


361 


WAFGDSKYIYKMPTYKDYIREYFSNSMELDTRIALLFLKKRSANYPVPYKKHMGVIESW 


420 


Query: 


414 


- -ISKDSNKYTENIKKRLSKEEELSLDDFTSSIGVPLL- -HSKTLERQTEIVGILE-ADV 


468 






I D + E++K R+S E + ++ L + L + + I++ ++V 




Sb j ct : 


421 


GSIPNDLVQEVESLKTRISNFPESIVRTYSIKANKDLAKKYLPYIMKTSTIEDVMSLSNV 


480 


Query: 


469 


PDNVRYNFIATHIKNFPKEELFLLVEKIID EGIFETSRRRFLKAFDLL 516 








P + FI IF EEL + K ID +GI T R+ + ++ ++ 




Sbjct: 


481 


PLYNKLRFILFKIDKFKVEELKDFIVKNIDMGEGKGISSTLYRKIVMSYSII 532 





A related GBS gene <SEQ ID 8599> and protein <SEQ ID 8600> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrira Score: 1.55 
GvH: Signal Score (-7.5): 0.27 

Possible site: 54 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 2.44 threshold: 0.0 
PERIPHERAL Likelihood = 2.44 214 
modified ALOM score: -0.99 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

47.3/69.1% over 531aa 

Lactococcus lactis 

EGAD|36707| hypothetical protein Insert characterized 

GP|928833|gb|AAA98584.l] |L44593 0RF536; putative {Lactococcus lactis phage BK5-T} Insert 
characterized 

PIR|T1326l|T13261 hypothetical protein 536 - phage BK5-T Insert characterized 
ORF00184(301 - 1848 of 2154) 

EGAD | 36707 | 38110(1 - 532 of 536) hypothetical protein {Lactococcus 

lactis}GP| 928833 |gb|AAA98584.l| |L44593 0RF536; putative {Lactococcus lactis phage BK5- 
T}PIR|T1326l|T13261 hypothetical protein 536 - Lactococcus lactis phage BK5-T 
%Match =32.3 

%Identity =47.2 %Similarity =69.0 

Matches = 247 Mismatches = 155 Conservative Sub.s = 114 

126 156 186 216 246 276 306 336 

RMLILKAFYLAKFLKYYC*KK*CGTKRGQLYFRWGLIIKINKMVSKML**D* 

=1111 I =111 
MNFIENIKDNNQ 
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10 

366 396 426 456 474 495 525 555 

FPIIFVGSGITQRYFENAPTWEKLLKDIWLELFDEESYYAK--AFE--LRER FENNDFDIYTNLASLLEKEVSKAFI 

5 llllllllhhllhl 11 = 11 = = l = =l = = = l = II 1= = = =1 = 1 =1 =11 = = = = II 

FPIIFVGSGVTKRYFKNGLKWEQLLLELWNLVEEEKAFYTQYHVFENLLKSKNLSKSDKEFEINLMMAGILEEKINNAFY 
30 40 50 60 70 80 90 

585 615 645 675 705 735 765 795 

10 NGNIQVDNLDLKTAYELNISPFKQLVANRFSNLKIREEKIEEIKQFSQMLSKARIIITTNYDNFIEECLKTINVSVKINV 

= = =in i i= =1111 = 1 =n mi == iii 11 = 11 iii i = iiiiiiiini= 111 = 1 = 11 

SDEIiNIDNLTIAQAHTEHISPFRQCIAWTFSNLDRKKGFDEEIISFSKMLVKARFITOTNYDNFIEECFSKRNVSIKVJW 
110 120 130 140 150 160 170 

15 825 855 885 915 945 975 1005 1035 

GNKGLFLKSSDYGELYKIHGTVDDASTITITKEDYEKNVTKSALlNAKILSNLVESPIIiFLGYSLTDENIRKLLTDFAEN 

II 111=11=1111111111=1 = =11 II 111= I =1 Ihllllllll 111111=111111=111=111 ==ll 
GNSGLFVKSiroYGELYKIHGSVKNPOTICITSEDYKmffiSKLALVNAKILSNLTESPILFIGYSLTDKNIRELLTSYSEN 
190 200 210 220 230 240 250 

20 

1065 1095 1125 1155 1185 1215 1245 1275 

SPFDISESAQKIGWEYLPDSESIETWSSLPDLSVYYSCLKTDNFTNIYRLISKINQGFLPSEIAKYENVFRKIIEVKG 

l = = lll = l =111111 II 1= =ll = = lll -• = != : 111= II IN l|:||llll|:| lllllllll 
LPYEISEAAARIGWEYTPDKIEIQDIVSNIPDLGIHYTKISTDNYKKIYDEISQIEQGYLPSEIAKFEGAFRKIIEVKG 
25 270 280 290 300 310 320 330 

1305 1335 1365 1395 1425 1455 1485 1515 

ESKDLKTVLTSYEDLANLTEDEIRSKNIWAFGDERYIYKFPDFKEYVRSYFLDKETIPQEIVIRFIATQPVASHLPIKK 

= 1 = 1 11111= l = = = =l = = = lllllllll =1111 I :H = I II = = I = |: : =111 
30 KEKELDTVLTSFIDISKIOTEELKNKNIWAFGDSKYIYKMPTYKDYIREYFSNSMELDTRIALLFLKiaiSANYPVPYKK 

350 360 370 380 390 400 410 

1569 1599 1629 1683 1710 1740 

YMFAMSEY- - 1 SKDSNKYTENI KKRLSKEEELSLDDFTSSIGVPLLHS - -KTLERQTEIVGILE-ADVPDNVRYNFIATH 
35 :| , , | | , |::| |,| | : :: | | : : | :: ::|| : || 

HMGVIESWGSI PNDLVQEVESLKTRI SNFPES IVRTYS I KANKDIAKKYLPYLNKTSTIEDVMSLSNVPLYNKLRFILFK 
430 440 450 460 470 480 490 

1770 1818 1848 1878 1908 1938 1968 

40 IKNFPKEEIiFLLVEKI ID EGI FETSRRRFLKAFDLLHY* IKKSQHCYAMRDFFWTTINKRYENNCFYLTS ILQYIF 

1 I 111 == 1 II =11 ||:::::: I 

IDKFKVEELKDFIVKNIDMGEGKGISSTLYRKIVMSYSIITEGI 
510 520 530 

45 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8600 (GBS142) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 5; MW 54kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 33 (lane 6; MW 79.8kDa). 

The GBS142-GST fusion product was purified (Figure 195, lane 3) and used to immunise mice. The 
50 resulting antiserum was used for Western blot (Figure 249). These tests confirm that the protein is 
immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 555 

55 A DNA sequence (GBSx0594) was identified in S.agalactiae <SEQ ID 1761> which encodes the amino 
acid sequence <SEQ ID 1762>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 
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Possible site: 37 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2933 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98585 GB:L44593 integrase [Lactococcus phage BK5-T] 
Identities = 124/382 (32%) , Positives = 202/382 (52%) , Gaps = 21/382 (5%) 



Query: 


1 


MATYRQRGKKKLWDYRIFTIEKSELVA-SGSGFKTKREAMNEAMRIE- - -QQKLLVNSISS 


56 






MAI x + +K(al\. W x 1 is. J_i +- tjrr lis. +A rirtiVl lUi ++ +V+ J. 




Sb j ct : 


1 


MATYQKRGKT - - WQYfc>±bRI KQbLPRlji Kfc>(jFb lis^DAyAfciAMDlbbKJjKKLjt' 1 VJJPlKy 


DO 


Query: 


57 


DITLYDL-WFEWYSLIIKPSNLMTTKNKyFTRGSVIRKLFGNQKMKIKHSAYQRKIOT 


115 






+ J.+ 1 Willi it + + a 1 1 ++ N + 1 b+iyK UN 




Sb j Ct : 


59 


EISEYFKDWMELY KKNAIDEMTYKCaYEQTLKYIjKTYMPISlVljlSlillAS^ 


X-L4 


Query: 


116 


YAEKYTKNHVRRLNSDI KKAI QFAKRDG VLLSDFTDGWI AGRKFVKDADDKYLHS I FD - 


174 






+AE + K + ++ ++ +IQ +G L DFT V+ G K DK+++ FD 




Sb j ct : 


115 


FAETHAKASTKGFHTRVRAS I QPLI EEGRLQKDFTTRA VvKGNGNDKAEQDKFVN - - FDE 


172 


Query: 


175 


YKKVISYLENNLD- - YSNSI VYYLLLVLFKTGLRVGEALALTWDDVNFEDLEIKTYR- -R 


230 






YK+++ Y N L+ YS+ + +++ + TG+R EA L WDD++F + IK R 




Sb j ct : 


173 


YKQLVDYFRNRLNPNYSSPTMLFI ISI - - -TGMRASEAFGLVWDDIDFNNNTIKCRRTWN 


229 


Query: 


231 


FSGDKGTFSPPKTKTSIRTIPISQSLALILRDLKDDQQVMLKNLKIVNMNNQIFYDYRYG 


290 






+ G F PKT IR I I +L+D ++ Q+ + ++L I +++ + Y 




Sb j ct : 


230 


YRNKVGGFKKPKTDAGIRDIVIDDESMQLLKDFREQQKTLFESLGIKPIHDFVCYHPYRK 


289 


Query: 


291 


VSTNSAINKSLKNVLKILNINSKMTATGARHTYGSYLLAKGVDIWWARLMGHKDITQLL 


350 






+ T SA+ +L + LK LNI++ +T G RHT+ S LL GVDI V++ +GH + 




Sb j ct : 


290 


IITLSALQNTLDHALKKLNISTPLTIHGLRHTHASVLIiYHGVDIMTVSKRLGHASVAITQ 


349 


Query: 


351 


ETYGHVLTEVINKEYETVRSLV 372 








+TY H++ E+ NK+ + + L+ 




Sb j ct : 


350 


QTYIHI IKELENKDKDKI IELL 371 





There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 556 

A DNA sequence (GBSx0595) was identified in S.agalactiae <SEQ ID 1763> which encodes the amino 
acid sequence <SEQ ID 1764>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1603 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10209> which encodes amino acid sequence <SEQ ID 
1021O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07266 GB:AP001519 unknown conserved protein in others 
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[Bacillus halodurans] 
Identities = 26/71 (36%) , Positives = 39/71 (54%) , Gaps = 6/71 (8%) 

Query: 37 WWDIDNLQELLGIGRSKLINDILIiNPDIKKEVDLSINPNGFI VYPKGKGSRYKILATK- - 94 
5 WW + +L+E G L +ILL+P K +D I GF+ YP+ KG R+ +A+ 

Sbjct: 4 WWSMQDLKERTGYSEDWLKENILLHPRYKPMLD- - IENGGFVYYPEKKGERWCFIASSME 61 

Query: 95 --ARKYFEDNF 103 
+KYF+D F 

10 Sbjct: 62 EFLKKYFKDIF 72 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 557 

A DNA sequence (GBSx0596) was identified in S.agalactiae <SEQ ID 1765> which encodes the amino 
acid sequence <SEQ ID 1766>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have an uncleavable N-term signal seq 
20 INTEGRAL Likelihood = -3.88 Transmembrane 12 - 28 ( 11 - 29) 

Final Results 

bacterial membrane Certainty=0 .2550 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99663 GB:U67604 chromosome segretation protein (smcl) 
[Methanococcus jannaschii] 
30' Identities = 53/210 (25%) , Positives = 95/210 (45%) , Gaps = 33/210 (15%) 



35 



+F +G+L N + + + + + K++DE S 1+ K LI 

Sbjct: 133 LFRRLGLLGDNVISQGDLLKIINISPIERRKIIDEISGIAEFDEKKKKAEEELKKARELI 192 

Query: 70 Q SNLSNNIEKNKQELVQKNSYVK--EDTKYIRDEMLIEKKSK EEVYNHV 116 

+ S + NN++K K+E Y+K E+ K + ++++K S E + N + 

Sbjct: 193 EMIDIRISEVENNLKKLKKEKEDAEKYIKLNEELKAAKYALILKKVSYLNVLLENIQNDI 252 

40 Query: 117 KNGDKLIEKMAFANELILKFGEVSRENQMLGLKVNSLEEKIVDLSNQPKNDEISKLRKSI 176 
KN ++L NE + K E+ E + L L++N+ I++ N+ N+E+ +L KSI 
Sbjct: 253 KNLEEL KNEFLSKVREIDVEIENLKLRLNN 1 INELNEKGNEEVLELHKS I 302 

Query: 177 SSFERELSRFEDVGYSEAEEIKSTLRRILN 206 
45 E E+ + V S E+K I N 

Sbjct: 303 KELEVEIENDKKVLDSSINELKKVEVEIEN 332 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1766 (GBS315) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 42 (lane 4; MW 26.7kDa) and in Figure 239 (lane 5; MW 41kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 47 
(lane 5; MW 52kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 558 

A DNA sequence (GBSx0597) was identified in S.agalactiae <SEQ ID 1767> which encodes the amino 
acid sequence <SEQ ID 1768>. This protein is predicted to be surface protein. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 26 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.70 Transmembrane 229 - 245 ( 226 - 248) 

Final Results 

10 bacterial membrane Certainty=0. 4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAA47097 GB:X66468 orf iota [Streptococcus pyogenes] 

Identities = 90/262 (34%) , Positives = 138/262 (52%) , Gaps = 26/262 (9%) 

Query: 4 VKVLSLITV-SGLFLMAGNLSASADWISGGDTIMLSGVDAGVSDSIMPPPSSINPV--- 59 
+K L+L+T+ S L++ + + AD S D +L+ D V P + ++PV 

20 Sbjct: 1 MKKLALLTLFSTTLLVSAPIVSFADETASSSDINILADDDPWPVEPTDPTTPVDPVDPV 60 

Query: 60 TDTTEPSAPTPSTDPI - -TDTTEPSAPTPSTDPI - -TDTTEPSAPTPST 104 

T+ TEP+ PT T+P T+ TEP+ PT T+P T+ TEP+ PT T 
Sbjct: 61 DPVDPVDPVDPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPT 120 

25 

Query: 105 DQTTGTTDSS-TPSSSTTNPVDGITDNGTKPNAGIDKPSTNKPSDHSESSI--KPVTKPT 161 

+ TT + TSTP+ T+P + +PS +E ++ KPV 

Sbjct: 121 EPTEPTEPTEPTEPSKPTEPTE--PSKPTEPTEPTEPSKPTEPSKPTEPTVPNKPVDTNP 178 

30 Query: 162 INQPITTvTGDQVIGTQDGKA/LVQTPSGTQLK-DAAEVGGNVQKDGTVAIKKSDGKIEVL 220 

I P+ T TG ++ +D K ++Q GT K +A E+G +VQKDGTV +K SDGK++VL 
Sbjct: 179 IENPYOTDTGWIVAVEDSKPIIQLADGTTKKVEAKEIGADVQKDGTVTVKGSDGKM 238 

Query: 221 PKTGEGKTI - FTI VGLLLIAGA 241 
35 PKTGE I +++G L++ G+ 

Sbjct: 239 PKTGETANIALSVLGSLMVLGS 260 

There is also homology to SEQ ID 760. 

SEQ ID 1768 (GBS141) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
40 extract is shown in Figure 19 (lane 4; MW 35kDa). The GBS141-His fusion product was purified (Figure 
194, lane 3) and used to immunise mice. The resulting antiserum was used for FACS (Figure 295), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 559 

A DNA sequence (GBSx0598) was identified in S.agalactiae <SEQ ID 1769> which encodes the amino 
acid sequence <SEQ ID 1770>. Analysis of this protein sequence reveals the following: 



50 



Possible site: 18 

>» Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8601> and protein <SEQ ID 8602> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 14.39 
10 GvH: Signal Score (-7.5): -1.23 

Possible site: 18 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 8.96 threshold: 0.0 
PERIPHERAL Likelihood = 8.96 104 
15 modified ALOM score: -2.29 

*** Reasoning Step: 3 

Final Results 

20 bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 1770 (GBS 17) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 4 (lane 2; MW 24kDa). 

The His-fusion protein was purified as shown in Figure 189, lane 10. 
Example 560 

A DNA sequence (GBSx0599) was identified in S.agalactiae <SEQ ID 1771> which encodes the amino 
acid sequence <SEQ ID 1772>. Analysis of this protein sequence reveals the following: 

30 Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS gene <SEQ ID 10779> and protein <SEQ ID 10780> were also identified. A further related 
GBS nucleic acid sequence <SEQ ID 10957> which encodes amino acid sequence <SEQ ID 10958> was 
40 also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1772 (GBS643) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 129 (lane 2-4; MW 79kDa) and in Figure 186 (lane 2; MW 79kDa). It was also 
45 expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 129 
(lane 5-7; MW 54kDa) and in Figure 176 (lane 5; MW 54kDa). 

GBS643-GST was purified as shown in Figure 236, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 561 

A DNA sequence (GBSx0600) was identified in S.agalactiae <SEQ ID 1773> which encodes the amino 
acid sequence <8EQ ID 1774>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5815 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 562 

A DNA sequence (GBSx0601) was identified in S.agalactiae <SEQ ID 1775> which encodes the amino 
acid sequence <SEQ ID 1776>. This protein is predicted to be membrane protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 33 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




•13 


.32 


Transmembrane 


311 - 


327 


( 


282 


- 332) 


INTEGRAL 


Likelihood 




•10. 


.46 


Transmembrane 


293 - 


309 


( 


282 


- 310) 


INTEGRAL 


Likelihood 




-8. 


.55 


Transmembrane 


390 - 


406 


( 


388 


- 410) 


INTEGRAL 


Likelihood 




-7. 


.64 


Transmembrane 


49 - 


65 


( 


40 


- 69) 


INTEGRAL 


Likelihood 




-5. 


.68 


Transmembrane 


100 - 


116 


( 


98 


- 122) 


INTEGRAL 


Likelihood 




-4. 


.35 


Transmembrane 


130 - 


146 


( 


127 


- 148) 


INTEGRAL 


Likelihood 




-3. 


.88 


Transmembrane 


344 - 


360 


( 


342 


- 363) 



Final Results 

bacterial membrane Certainty=0 . 6328 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB70618 GB:AJ243106 membrane protein [Streptococcus thermophilus] 
Identities = 234/665 (35%) , Positives = 379/665 (56%) , Gaps = 59/665 (8%) 

Query: 13 FAKVKD VTJ I FALKAYME ITH -GAETGAQS ILLDVFVNFPFFLLNLI VGLFSVI LRFFENF 71 

FAK+K VDIF+LK+YME T+ G+ GA ++ ++FVN FF+LN +VG FS+++R E 
Sbjct: 5 FAKLKGVDIFSLKSYMEPTNFGSFNGAWVLINELFVNLFFFILNAWGFFSLLIRILEKI 64 

Query: 72 SLYDTYKQTVYHSSQKLWENLSGN--GSYTS-SLLYLLVAISAFSIFISYLFSKGDFSKR 128 

LY TYK V+H + +W +G+ G+ T+ SL+ L+ + AF +F Y FSKG FS+ 
Sbjct: 65 DLYATYKTYVFHGASSIWHGFTGSNTGNITNKSLVGTLLLVLAFYLFYQYFFSKGSFSRT 124 

Query: 129 LIHLFWIILGMGYFGTIQSTSGGIYILDTVHQIAGSFSDAVTNLSLDNPSGGKTKITQK 188 

L+H+ +V++L +GYFGT+ TSGG+Y+LDTV+ ++ + + + +D KI + 

Sbjct: 125 LLHVCLVLLIiALGYFGWAGTSGGLYLLDTVNNVSKDOTKKIAGIKVDYAKDKSIKIGK- 183 

Query: 189 SSVADNYVMKTSYTAYLFVNTGQLNGKFHNNQTGKEEKFDNEQVLGKYDKSGKFITPKQK 248 
S++D+Y+ +TSY AY+FVNTGQ NGK+ N+Q GKEE FD+ +VLG DK+G F K K 
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Sb j ct : 


184 


-SMSDSYIAETSYKA.YVFVNTGQENGKYKNSQDGKEEAFDDSKVLGTSDKNGNFKAVKAK 


242 


Query: 


249 


DIRreTDNLGDKATEGEEKNRWLSAVNDYLWIKSGYVILKIFEAVILAVPLILIQLIAFM 


308 






+ Y D+LG+ A + EKNRW+SA+ D+++ + YVI KI EA +LAVP+IIjIQL+ + 




Sbjct: 


243 


ERSKYLDDLGEGAM3DGEKOTIWSAMPDFIFTRVFWIFKIVEAFV1AVPIILIQLLNVV 


302 


Query: 


309 


ADVLVIIIMFIFPIALLVSFLPRMQDIIFKVLKVMFGaVSFPALAGFLTLIVFYTQTLIA 


368 






A +LV+ ++ +FP+ LL+SF+PRMQ+++F VLKVMFG + FPA+ LTL++FY + +1 




Sb j ot : 


303 


AQILVLTMILLFPWLLMSFVPRMQELVFGVLKVMFGGLIFPAITTLLTLLIFYIEKMIE 


362 


Query: 


369 


TFVKKKFTDGSLLSGSNFKGQAILFMIiLITVFVQGCVFWGIWKYKETFLRLIIGSRASQV 


428 






V F DG L + + ++F LL++V +G +++ IW++K L+ I+GS+A V 




Sbjct: 


363 


NIVTNGF-DGVLKTLPSLLLFGLVFKLLVSWSKGVIYFLIWRFKGQLLQFILGSKARMV 


421 


Query: 


429 


INQSVDKINEKAENLGITPKSIYERAHDMSSIAMMGAGYGVGTMMNAQ DN 


478 






+ VKEA + P A++ + GAG+G G MMNA+ N 




Sb j ct : 


422 


ATDIGTKVEHGVTKSKEVASQV PTRSLATAQHLGNFTLAGAGFGTGVMMNAKSHFQN 


478 


Query: 


479 


WNAFKERQQANLDDGQSKTODADKOTEANADDTVISKEAELTNEGEYQSELPKFASKRIE 


538 






+F R++ + + + + + + +I ++P+KI 




Sbjct: 


479 


AGSFFTRKEPSQPETVMPSGPTEAPITPESPEPIIP PTQTPPDNFKTIG 


527 


Query: 


539 


QLGKESSYELSFISEGNSTEEILKNVKSDNHTFQEGDGDTSLTNQDMITNDIENHSNNYT 


598 






+ + +SEG + E ++ + + 




Sb j ct : 


528 


EEKPTPPSDSPIMSEGTPSSE DEFQTLKEEWM 


559 


Query: 


599 


SPLKQRKLNKLEGELSQFNSDVSMTKNHGKNAFEKGFNASKTKEVRKQHNLERQSKVLEE 


658 






SP KQ ++N LE L + +M K G NAF + + + T++ + + N+ER+ ++ + 




Sbjct: 


560 


SPFKQHRINTLERRLDAYKDPQAMYKAQGSNAFTRAYRKTLTRDDKIRANIERRDRLTQR 


619 


Query: 


659 


LEKLR 663 








L +LR 




Sbjct: 


620 


LNQLR 624 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 563 

A DNA sequence (GBSx0602) was identified in S.agalactiae <SEQ ID 1777> which encodes the amino 
acid sequence <SEQ ID 1778>. This protein is predicted to be conjugative protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3714 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB70617 GB:AJ243106 conjugative protein [Streptococcus thermophilus] 
Identities = 515/757 (68%), Positives = 612/757 (80%), Gaps = 1/757 (0%) 

Query: 1 MSDFFADLADDVKELGLETLDFTVDTLTHEMEIPYQFDWLIGVDLGKGQYNANIKEFIYN 60 

M DF LADD +ELG E L +TVD LT EMEIPYQFDW+IGV L K + A +K+ Y 
Sbjct: 78 MRDFSEAIiADDSRELGEELLLYTVDRLTDEMEIPYQFDWVIGVTLRKQNHGATVKDLAYE 137 

. Query: 61 QFESIASNFASLAGYEVEVDEDWYKEHSEEELLVYSLLSTLKAKRLTDVDLFYYQRMQFL 120 
F + A GYE + WY ++ +E ++ S L+AKRLT+ +LFYYQRMQ+L 
Sbjct: 138 SFNEFSEKIAKGLGYEYALSPTWYDDYRSDEFTIFQAFSvLRAKRLTNEELFYYQRMQYL 197 



10 



20 



25 



30 



40 



45 



55 



60 
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RY+PH K EV+ANR+ N+TDTLIK L+GGFL+LES YGSSFV++LPVG+F FNGFHL 



GE VQR++FPVELR KAEFID K+ G MGRSNTRY IM+EA. NT+TVQQD+I+MG+ S 



LKDLMKKVGNKE+IIEYG YL+V+ SS+NQL+QRR IL+YFDDM V + EAS D PYLF 



15 QALLYG++LQK TR W H+VTARGFSELM FTNT SGNRIGWYIGRVDN + WDSI +A 



Query : 


TOT 


Sbj ct : 


T Qo 


Query: 


181 


Sb j ct : 


258 


Query: 


241 


Sbj ct : 


"3 T O 


Query: 


301 


Sb j ct : 


378 


Query: 


361 


Sb j ct : 


438 


Query: 


421 


Sb j ct : 


498 


Query: 


481 


Sb j ct : 


558 


Query: 


541 


Sb j ct : 


618 


Query: 


601 


Sb j ct : 


678 


Query: 


660 


Sb j ct : 


738 


Query: 


720 


Sb j ct : 


798 



I SKN+VL+NATV NKED+AGK+TKNPH+ 1 ITGATGQGKS+LAQ+ 1 FL A QNV+ LY 



+DPKRELR HY +V++ PE+AR++P RKKQI+ NFVTLDSS+ NHGVLDPIV+LDKE 



A AKNML +LL+ ++ +DQ TA+TEAI+ ++ +R AGE VGF V+E L ++ S E 



+ SVGRY +1+ NSILELAFSDG GL+YE RVT+LEV +L LPKD S ISDHE NS 



35 IALMFALGAFC HFGER+++E T+E FDEAW+LM+S+EGKAV1 K+MRR+GRSK N h L+ 



+QSVHDAENDDDTTGFGTIF+FYEKSEREDIL HV LEVT NLEWIDNMI SGQCLYYDV 
TQSVHDAENDDDTTGFGTIFAFYEKSEREDILRHVNLEVTESNLEWIDNMISGQCLYYDV 797 

YGNIJSMI S IHNIHPDIDPLLKPMKKTVSSHLENKYAS 756 
YGNLNMI S+HN+ DID LLKPMK TVSS LENKYAS 
YGNIMISVHNLFEDIDMLLKPMKATVSSSLENKYAS 834 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 564 

50 A DNA sequence (GBSx0604) was identified in S.agalactiae <SEQ ID 1779> which encodes the amino 
acid sequence <SEQ ID 1780>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 



Possible site: 26 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3469 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAC18595 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 
Identities = 110/125 (88%) , Positives = 119/125 (95%) 

Query: 81 MNYKASKQLTDWFKRLVGVQRTTFEEMLAVLKTAYQRKHAKGGRTPKLSLEDLLMATLQ 140 
5 MNYEASKQLTD RFKRLVGVQRTTFEEMM.VLKTAYQ KHAKGGR PKLSLEDLLMATLQ 

Sbjct: 1 MIsnfFASKQLTDARFKRLVGVQRTTFEEMIAvLKTAYQLKHAKGGRKPKLSLEDLLMATLQ 60 

Query: 141 YMREYRTYEQIAADFGIHESNLIRRSQWVESTLIQSGFTISKTHLSAEDTVIVDATEVKI 200 
Y+REYRTYE+ IAADFG+HESNL+RRSQWVE TL+QSG TIS+T LS+EDTV++DATEVKI 
10 Sbjct: 61 YTOEYRTYEEIAADFGVHESNLLRRSQWvEVTLVQSGVTISRTPLSSEDTVMIDATEVKI 120 

Query: 201 NRPKK 205 

NRPKK 
Sbjct: 121 NRPKK 125 

15 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 565 

20 A DNA sequence (GBSx0605) was identified in S.agalactiae <SEQ ID 1781> which encodes the amino 
acid sequence <SEQ ID 1782>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.58 Transmembrane 39 - 55 ( 32 - 66) 

25 



30 



Final Results 

bacterial membrane Certainty=0. 6031 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 566 

A DNA sequence (GBSx0606) was identified in S.agalactiae <SEQ ID 1783> which encodes the amino 
acid sequence <SEQ ID 1784>. This protein is predicted to be Cag-W. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 
40 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.82 Transmembrane 50 - 66 ( 49 - 66) 
INTEGRAL Likelihood = -3.72 Transmembrane 25 - 41 ( 23 - 45) 

Final Results 

45 bacterial membrane Certainty=0 .2529 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
50 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 567 

A DNA sequence (GBSx0607) was identified in S.agalactiae <SEQ ID 1785> which encodes the amino 
5 acid sequence <SEQ ID 1786>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.80 Transmembrane 36 - 52 ( 32 - 60) 

10 Final Results 

bacterial membrane Certainty=0 .4121 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12298 GB:Z99106 similar to transposon protein [Bacillus subtilis] 
Identities = 68/339 (20%) , Positives = 133/339 (39%) , Gaps = 49/339 (14%) 

Query: 16 KKEEGGKQPKTKEVKQRTANFIV--YGILGLLFIVGFFGSLRAIGLSNQVQHLKETVIAV 73 
20 K+ E ++ K K + R+ V + +G L + L +1 +Q+ +K+ 

Sbjct: 24 KRIERPEKDKQKVPRDRSKLIAVTLWSCVGSLLFICLLAVLLSINTRSQLNDMKDETNKP 83 

Query: 74 EKKSKHKKTDDSLDISRIQYYMNNFVYYYINYS- -QDTADQRKTELENY YSF 123 

K K +++++++ F+ Y+N Q++ ++R LE+Y + 

25 Sbjct: 84 TNDDKQK ISVTAAENFLSGFINEYMNVKNDQESIEKRMQSLESYMVKQEDNHFED 138 

Query: 124 STASMTDDVRKSRTLQTQRLISVEKEKDYYIALMRIGYEV - 163 

D ++ R L+ L +V++ + ++ YE 

Sbjct: 139 EERFNVDGLKGDRELKGYSLYNVKEGDKNS L FQYKVT YENLYPVEKEVEKEVKDGKKKKK 198 

30 

Query: 164 DKKSYQMNIjAVPFQMQRGLIAIVSQPYTVAEDLYLGKSKAFEKKTLDQVKEL 215 

+K QM L +P + A+ + PY +Y K K 4- E 

Sbjct: 199 VKEKVKTNEKYEKQMLLNIPVTNKGDSFAVSAVPYFT--QIYDLKGDIAFKGKEETRDEY 256 

35 Query: 216 SKEQVSSIQKFLPVFFNKyALINKTDLKLLMKTPELMGKGFKVSELDLNNAIYYQEKKHQ 275 

+ E+ SI+ FL FF KYA K ++ +MK PE + E + + ++ KK 

Sbjct: 257 AGEKKESIESFLQNFFEKYASEKKEEMVYMMKKPEALEGNLLFGE--VQSVKIFETKKGF 314 

Query: 276 WQLSVTFEDLVTGGTRSENFTLYLFKADNGWYVEEMYH 314 
40 V +V F++ +E F+L + + +YV ++ H 

Sbjct: 315 EVFCAWFKEKENDIPVNEKFSLEITENSGQFYVNKLKH 353 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1786 (GBS333d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
45 cell extract is shown in Figure 145 (lane 8-10; MW 58kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 145 (lane 11 & 13; MW 33kDa), in 
Figure 182 (lane 2; MW 33kDa) and in Figure 185 (lane 3; MW 58kDa). 

GBS333d-GST was purified as shown in Figure 236, lane 2. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 568 

A DNA sequence (GBSx0608) was identified in S.agalactiae <SEQ ID 1787> which encodes the amino 
acid sequence <SEQ ID 1788>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4177 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB38326 GB:Y17736 hypothetical protein [Streptomyces 
coelicolor A3 (2) ] 
15 Identities = 45/80 (56%) , Positives = 56/80 (69%) 

Query: 4 FTEEAWKDOTSWQQEDKKILKRINRLIEDIKRDPFEGIGKPEPLKYHYSGAWSRRITEEH 63 

FT W+DYV W + D+K+ KRINRLI DI RDPF+G+GKPEPLK SG WSRRI + H 
Sbjct: 5 FTSHGWEDYVHWAESDRKVTKRINRLIADIARDPFKGVGKPEPLKGDLSGYWSRRIDDTH 64 

20 

Query: 64 RLIYMIEDGEIYFLSFRDHY 83 

RL+Y D ++ + H HY 
Sbjct: 65 RLVYKPTDDQLVIVQARYHY 84 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 569 

A DNA sequence (GBSx0609) was identified in S.agalactiae <SEQ ID 1789> which encodes the amino 
30 acid sequence <SEQ ID 1790>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 5669 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial, outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 1021 1> which encodes amino acid sequence <SEQ ID 
40 10212> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD17306 GB:AF121418 putative Phd protein [Francisella 
tularensis subsp. novicida] 
Identities = 26/84 (30%) , Positives = 45/84 (52%) 

45 

Query: 4 MEAI VYSHFRNNLKDYMKKVNDEFEPLI WNKNPDENI\A/LSQDSWESLQETIRLMENDY 63 

M+ + YS FRN L D M +V P+IV + E +V++S + +++ +ET LM + 

Sbjct: 1 MQTVNYSTFRNELSDSMDRVTKNHSPMIVTRGSKKEAVvMMSLEDFKAYEETAYLMRSMN 60 

50 Query: 64 LSHKVINGISQVKEKQVTKHGLIE 87 

++ N I +V+ + LIE 

Sbjct: 61 NYKRLQNSIDEVESGLAIQKELIE 84 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 570 

A DNA sequence (GBSx0610) was identified in S.agalactiae <SEQ ID 1791> which encodes the amino 
5 acid sequence <SEQ ID 1792>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have, no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2407 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 571 

A DNA sequence (GBSx0611) was identified in S.agalactiae <SEQ ID 1793> which encodes the amino 
20 acid sequence <SEQ ID 1794>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 1274 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10213> which encodes amino acid sequence <SEQ ID 
30 10214> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB60015 GB:U09422 0RF18 [Enterococcus faecalis] 
Identities = 41/140 (29%) , Positives = 73/140 (51%) , Gaps = 3/140 (2%) 

35 Query: 23 FPVEMSELKLALGLREEDDLEYI IADSDCQL - LKEHDS IEMINQFVEL VENVDSELVKAV 81 

FP++ E+K +GL +E + EY I D + + E+ SI +N+ E+V + EL + 
Sbjct: 26 FPIDFEEvTCEKIGLNDEYE-EYAIHDYELPFTVDEYTSIGELNRLWEMVSELPEELQSEL 84 

Query: 82 HQVIGYTASDFVDYDFNFGDCCLLSDVTTRRELGEYYFDELGVQGVGKEALEMYFDHEAY 141 
40 ++ + +S + + D + SD ++ YY +E G G +L+ Y D++AY 

Sbjct: 85 SALLTHFSS - IEELSEHQEDI I IHSDCDDMYDVARYYIEETGALGEVPASLQNYIDYQAY 143 

Query: 142 GRDIDLESQGGFSDYGYVEI 161 
GRD+DL +++G EI 

45 Sbjct: 144 GRDLDLSGTFISTNHGIFEI 163 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 572 

A DNA sequence (GBSx0612) was identified in S.agalactiae <SEQ ID 1795> which encodes the amino 
acid sequence <SEQ ID 1796>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1366 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 573 

A DNA sequence (GBSx0613) was identified in S.agalactiae <SEQ ID 1797> which encodes the amino 
acid sequence <SEQ ID 1798>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
20 >» Seems to have no N-terminal signal sequence 

Final Results . 

bacterial cytoplasm Certainty=0. 1484 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

, Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 574 

A DNA sequence (GBSx0614) was identified in S.agalactiae <SEQ ID 1799> which encodes the amino 
acid sequence <SEQ ID 1800>. This protein is predicted to be abortive phage resistance protein. Analysis of 
this protein sequence reveals the following: 

35 Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2205 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10215> which encodes amino acid sequence <SEQ ID 
1 02 1 6> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB53710 GB:U94520 abortive phage resistance protein 
[Lactococcus lactis] 
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Identities = 131/499 (26%) , Positives = 210/499 (41%) , Gaps = 97/499 (19%) 

Query: 3 MFSKIEFKNFMSFSNLT FDLLNRGKCKDIIAIYGENGSGKTN 44 

M F+NF+SF L+ D+ N K + IYG N SGK++ 

Sbjct: 1 MLWFRFENFLSFDKLSTFSMAPGKSRQHMEDLIELDIKNNQKLLKLSTIYGANASGKSS 60 

Query: 45 IVEAF---KLLVL SLQSMESLNENTRLQSLLKEQTNKE ENQKTNFGDISEIL 93 

V+A K L++ L S N+NT SL + + E E++ ++G S IL 

Sbjct: 61 FVDAIGISKSLIIRGFYNGLVLSNSYNKimroNSIiNETKFEYEIVIEDKVYSYG-FSVIL 119 

Query: 94 DKISFFTTFKGIAKOTHRIASEGISrriLKYYFNIEKDNGYYLLEYNENNELVKEELVFKIK 153 

F + + N ++ Y KDN YN N+E L + 

Sbjct: 120 SLKKFMSEWLYDITNDEKM IYTIDRKDN SYNINDEF LNLDEQ 161 

15 Query: 154 SNKGVHFSITNIDGLSQSLNKTIFKNTIFKDLTEQIEKYWGKHTFLSIFN- -NYCLEV- - 209 
SN + I + S + N +F N++ D + IE F +FN N LEV 

Sbjct: 162 SNNRISIYIDD SANDOTQLFLNSL-lffiGKXTIESKDNSTIFKKVFNWFNm'LEVLG 216 

Query: 210 NEEF INEQVSINFQKVVDEFDKIFIWSGNFRGPFHSTELLLK 251 

20 EEF + + + +N V+D N P E +L 

Sbjct: 217 PGDEARGS1ASLTQEEEEFKEDLGKYLELNDTGVIDIVQVPVDNIjSNV~-PAKLQERILD 274 

Query: 252 DISKGKIDKSEKEKLSYTEEIIYKYFSALYIDIKDVKYKQDAQGQEIKYELMIRKNIGGD 311 
+1+ I K +KE+ EI F+ + +++ Q+ Q +EL K+ G 

25 Sbjct: 275 NITT-DIKKKKKER EDIEISFNTILNTSQNIYIIQNNDEQFEYFELKF-KHKNGT 327 

Query: 312 LLDVPISLESQGTKNLLDLLKV-FNNVLDGKICIVDEIDSGIHDLLMNSILNDLK--GSV 368 

L +S ES GT L++L V F+N D K+ ++DEID +H LL + + K S+ 
Sbjct: 328 LYS- -LSEESDGTVRLIELFSVLFHN- -DEKVFVIDEIDRSLHPLLTYMFIESFKKQKSI 383 

30 

Query: 369 NGQLIFTTHDT TLL- - KELSPSSAYFLNVDIKGNKVIISGNEADKKIGVNNNLEKLYLSG 426 

N QLI TTH+ +L + L +F++ + +GN + S E ++ + ++ YL+G 

Sbjct: 384 N-QLIVTTHEDYILNFELLRRDEVWFVDKNFEGNSSMFSLEEFKERF--DKDINTSYLNG 440 

35 Query: 427 FFGAVPDPLDIDFSDLFLD 445 

+G +P+ L FS+ D 
Sbjct: 441 RYGGIPN-LSCLFSEFAKD 458 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 575 

A DNA sequence (GBSx0615) was identified in S.agalactiae <SEQ ID 1801> which encodes the amino 
acid sequence <SEQ ID 1802>. This protein is predicted to be repressor (rstR-1). Analysis of this protein 
45 sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3724 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

55 >GP:AAB84427 GB:AF027868 transcription regulator [Bacillus subtilis] 

Identities = 31/81 (38%) , Positives = 53/81 (65%) , Gaps = 2/81 (2%) 

Query: 9 QKLKELRKEKKLTQTEIASKLNISQKSYSNWESGKAEPTLDNIIKLftNILDVTVDYLLGR 68 
Q+L++LRK KLT +LA K+ I++ SY +E+ +P LD ++ LA + DV+VDY+LG 
60 Sbjct: 4 QRLRQLRKAHKLTMEQLAEKIGIAKSSYGGYEAESKKPPLDKLVILARLYDVSVDYILGL 63 
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Query: 69 SDNFSNTIVLSKNNMKSFSKR 89 

+D+ + + N+K F ++ 
Sbjct: 64 TDDPDPKV- -ERKNLKEFLEK 82 

5 

There is also homology to SEQ ID 1740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 576 

10 A DNA sequence (GBSx0616) was identified in S.agalactiae <SEQ ID 1803> which encodes the amino 
acid sequence <SEQ ID 1804>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N- terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 3607 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 577 

25 A DNA sequence (GBSx0617) was identified in S.agalactiae <SEQ ID 1805> which encodes the amino 
acid sequence <SEQ ID 1806>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 0564 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 10217> which encodes amino acid sequence <SEQ ID 
1 02 1 8> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12294 GB:Z99106 similar to transposon protein [Bacillus subtilis] 
Identities = 93/348 (26%) , Positives = 164/348 (46%) , Gaps = 28/348 (8%) 

40 

Query: 81 SRLQVMIDYTOITLKDWDLEFFCRNFLHC^FKEFQPFESKLMNYNHLWKRGDIWIFDFA 140 

S L M+DY+R++ K D++ LH + +S Y ++ I +F A 

Sbjct: 26 SPLVSMVDYIRVSFK-THDTORIIEEVLHLSKDFMTEKQSGFYGYVGTYELDYIKVFYSA 84 

45 Query: 141 DKHETGNFQITVQLSGRGCRQLELLMETEKFTWHDWLSYLRNSYRDDMNVTRFDIAIDEL 200 

G + +++SG+GCRQ E +E K TW+D + ++ + + TRFD+AID+ 
Sbjct: 85 PDDNRG VLIEMSGQGCRQFESFLECRKKTWYD FFQDCMQQGGSFTRFDLAIDD- 137 



Query: 201 YLGKDRENEQFHLSDMISKYYRHELDFESLRTWNYIGGGSIjNFSDMEEIEQNRQGISLYF 260 
50 + F + +++ K + E R ++ GS + SD G ++YF 

Sbjct: 138 KKTYFSIPELLKKAQKGEC- ISRFRKSDF- -NGSFDLSD GITGGTTIYF 183 
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Query: 261 GSRQSEMYFNFYEKRYEIAKQEGITVEFALEIFELWNRYEIRLSQSKANAAVDEFISGVP 320 

GS++SE Y FYEK YE A++ I +EE + WNRYE+RL +A A+D + 
Sbjct: 184 GSKKSEAYLCFYEKNYEQAEKYNI PLEELGD WNRYELRLKNERAQVAIDALLKTKD 239 

Query: 321 IGEISRGLIVSKIDVYDGKNEY--GSFQADRKWQLMFGGVEPLKFVTKPEAYSIERTLRW 378 

+ 1+ +1 + + D ++ W G V L KP+ +++ W 

Sbjct: 240 LTLIAMQIINNYVRFVDADENITREHWKTSLFWSDFIGDVGRLPLYVKPQKDFYQKSRNW 299 

Query: 379 LSDSVSPSLAMIREYDMIVDGDYLQTILNSGEVNERGEKILDSIKASL 426 

L +S +P++- M+ ED 4- L ++ E+ ++ +K+LD A + 
Sbjct: 300 LRNSCAPTMIOWLEADEHLGKTDLSDMIAEAEIADKHKKMLDVYMADV 347 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8603> and protein <SEQ ID 8604> were also identified. Analysis of this 
protein sequence reveals a RGD motif at residues 131-133. 

The protein has homology with the following sequences in the databases: 

29.4/54.5% over 342aa 

Bacillus subtilis 

EGAD | 108511 | hypothetical protein Insert characterized OMNI | NT01BS0566 conserved 
hypothetical protein Insert characterized 

GP|l881297|dbj | BAR.19324 . 1 1 |AB001488 SIMILAR TO ORF20 OF ENTEROCOCCUS FAECALIS TRANSPOSON 
TN916 . Insert characterized 

GP|2632787|emb|CAB12294.l| |Z99106 similar to transposon protein Insert characterized 

PIR|G69774 |G69774 transposon-related protein homolog ydcR - Insert characterized 

ORF0010K205 - 1581 of 1887) 

EGAD|l085il|BS0487(6 - 348 of 352) hypothetical protein {Bacillus subtilis} OMNI |NT01BS0566 
conserved hypothetical protein GP| 1881297 |dbj |BAA19324.l| |AB001488 SIMILAR TO ORF20 OF 
ENTEROCOCCUS FAECALIS TRANSPOSON TN916. {Bacillus subtilis} 

GP|2632787|emb|CAB12294.l| |Z99106 similar to transposon protein {Bacillus subtilis} 
PIR|G69774 |G69774 transposon-related protein homolog ydcR - Bacillus subtilis 
%Match =9.7 

%Identity =29.3 %Similarity =54.4 

Matches = 103 Mismatches = 146 Conservative Sub.s = 88 

153 183 213 243 273 489 519 

GW*RSENGHAGHSAHTRALQRDLSILKPPFSNRGVRNEKFRILTPKNLYVSRVFR EQGKRKLTLEKYQEIKSHFG 

: =11 =1111 

MDELKQPPHANRGV 



567 597 627 657 687 717 747 

YLV--ENDS--SRLQVMIDYVRITLKDVRDLEFFCRNFLHCAFKEFQPFESKLMNYNHLWKRGDIWIFDFADKHETGNFQ 

:| =l== I I l=l!=l===l l== II : :| : | :: | :| | | 

VIVKEKNEAVESPLVSMVDYIRVSFK-THDVDRIIEEVLHLSKDFMTEKQSGFYGYVGTYELDYIKVFYSAPDDNRG 

30 40 50 60 70 80 90 



777 807 837 867 897 927 957 987 

ITVQLSGRGCRQLELLMETEKFTWHDWLSYLRNSYRDDMNVTRFDIAIDELYLGJCDRENEQFHLSDMISKYYRHELDFES 

= :::||:|||| = 1 = = l I I 1 = 1 = = = = = =1111 = 111 1= I = = = = I =1 

VLIEMSGQGCRQFESFLECRKKTWYD FFQDCMQQGGSFTRFDLAID DK- KTYFS I PELLKKAQKGEC - ISR 

100 110 120 130 140 150 



1017 1047 1077 1107 1137 1167 1197 1227 

LRTWNYIGGGSLNFSDMEEIEQNRQGISLYFGSRQSEMYFNFYEKRYEIAKQEGITVEEALEIFELWNRYEIRLSQSKAN 

= 1 == ||:::|| I = = llll = :|l 1= Mil II l = = I =11 = 11111 = 11 =1 

FRKSDF- -NGSFDLSD GITGGTTIYFGSKKSEAYLCFYEKNYEQAEKYNIPLEE LGDWNRYELRLKNERAQ 

170 180 190 200 210 220 



1257 1287 1341 1371 1401 1431 1461 

AAVDEFISGVPIGEISRGLIVSKIDVYDG-KN-EYGSFQ^RKWQLMFGGVEPLKFVTKPEAYSIERTLRWLSDSVSPSL 
|:| :: : |: :| : : | =| :: | I | I = 11= === II =1 =l== 

VAIDALLKTKDLTLIAMQIINNYWFVDADF^ITREHWKT^^ 
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240 250 260 270 280 290 300 

1491 1521 1551 1581 1611 1641 1671 1701 

AMIREYDMIVDGDYLQTILNSGEVNffiRGEKILDSIKASLGIL* 

5 |: | | : | :: |: :: :|:|| | s : 

KMVLEADEHLGKTDLSDMIAE1AELADKHKKMLDVYMADVADMV\^ 
320 330 340 350 

SEQ ID 8604 (GBS294) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
10 extract is shown in Figure 167 (lane 6 & 7; MW 65kDa - thioredoxin fusion), in Figure 238 (lane 2; MW 
65kDa) and in Figure 40 (lane 6; MW 37kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 3; MW 76kDa). 

Purified Thio-GBS294-His is shown in Figure 244, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 578 

A DNA sequence (GBSx0618) was identified in S.agalactiae <SEQ ID 1807> which encodes the amino 

acid sequence <SEQ ID 1808>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
20 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.61 Transmembrane 24-40(20-41) 
INTEGRAL Likelihood = -1.97 Transmembrane 53 - 69 ( 52 - 72) 

Final Results 

25 bacterial membrane — Certainty=0. 2444 (Affirmative) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:AAB60012 GB:U09422 0RF21 [Enterococcus faecalis] 

Identities = 136/473 (28%) , Positives = 228/473 (47%) , Gaps = 40/473 (8%) 

Query: 9 RGIKVKPYMRYMSYYL-FSFLFILFLTPVGVYSYYYLDL LKMMDKMSM 1 56 

RG +++P + + ++ + L +FL VG++ + L DK+ + I 

35 Sbjct: 4 RGKRIRPSGKDLVFHFTIASLLPVFLLWGLFHVKTIQQINWQDFNLSQADKIDIPYLII 63 

Query: 57 SVGTGLFLAFFVSWYLTWFLQEANPLFNKLDRLKRMSKFLYENGYVYEKR KKS 109 

S + + V++ F + +L ++++K + EN + ++ K S 

Sbjct: 64 SFSVAILICLLVAFV FKRVRYDTVKQLYHRQKLAKMILENKWYESEQVKTEGFFKDS 120 

40 

Query: 110 NKKTKTKYR-FPKVYVKQGKYDLSVSFEMAGGKFQKKFKDIGGELEDTFFMDFMEKTDDP 168 

+TK K FPK+Y + + + E+ GK+Q + + +LE + + +K 

Sbjct: 121 AGRTKEKITYFPKMYYRLKNGLIQIRVEITLGKYQDQLLHLEKKLESGLYCELTDKELKD 180 

45 Query: 169 RFKIYKIAYSAFLSRITVKDVIWNKDKGIKLMDGYYWDFINDPHLLVAGGTGGGKTVLLR 228 

+ Y L Y SRI++ D + KD ++LM +W++ PH+L+AGGTGGGKT + 
Sbjct: 181 SYvEYTLLYDTIASRISI-DEVEAKDGKLRLMKNWWEYDKLPHMLIAGGTGGGKTYFIL 239 

Query: 229 SILRCLAEI-GVCDICDPKRADFVTMSDLSAFEGRIAFEKADIIEKFENAVTIMFARYDF 287 
50 +++ L I DPK AD ++DL + + + K D++ E M R + 

Sbjct: 240 TLIEALLHTDSKLYILDPKNAD LADLGSVMANVYYRKEDLLSCIETFYEEMMKRSE - 295 

Query: 288 VRNEMKRLGHKDMKKFYDY-GLEPYFFVCDEYNALMSSLSYQEREIVDNAFTQYILLGRQ 346 
EMK++ + K Y Y GL +F + DEY AM L +E V N Q ++LGRQ 
55 Sbjct: 296 ---EMKQMKNYKTGKtWAYLGLPAHFLIFDEYVAFMEMLGTKEOTAVMNKLKQIVMLGRQ 352 



Query: 347 VGCNAIIAMQKPSADDLPTKIRSNMMHHISVGRI^DGGYVmFGDENRNKEFRFIKYLAG 406 
G I+A Q+P A L IR +++GR+ + GY MMFG + + K+F F+K 
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Sbjct: 353 AGFFLILACQRPDAKYLGDGIRDQFNFRVALGRMSEM6YGMMFGSDVQ-KDF-FLK 406 

Query: 407 RRVYGRGYSAVFGEVAREFYSPLLPKNFSFYDAFEKINRHENPFDPTENQEVS 459 

R+ GRGY V V EFY+PL+PK + F + +K++ T EV+ 

Sbjct: 407 -RIKGRGYVDVGTSVISEFYTPLVPKGYDFLEEIKKLSNSRQSTQATCEAEVA 458 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8605> and protein <SEQ ID 8606> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 8 

McG: Discrim Score: -10.05 

GvH: Signal Score (-7.5): -3.42 
Possible site: 40 

>>> Seems to have no N- terminal signal sequence 

ALOM program count: 2 value: -3.61 threshold: 0.0 

INTEGRAL Likelihood = -3.61 Transmembrane 24 - 40 ( 20 - 41) 
INTEGRAL Likelihood = -1.97 Transmembrane 53 - 69 ( 52 - 72) 
PERIPHERAL Likelihood = 1.01 224 
modified ALOM score: 1.22 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 2444 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

29.9/52.7% over 456aa 

Enterococcus faecalis 

EGAD 1 17035 | hypothetical protein Insert characterized 
GP|532554|gb|AAB60012.l| |U09422 ORF21 Insert characterized 

ORF00100(319 - 1677 of 2316) 

EGAD | 17035 | 17250 (2 - 458 of 461) hypothetical protein {Enterococcus faecalis} 
GP | 532554 |gb |AAB60012. 1 | |U09422 ORF21 {Enterococcus faecalis} 
%Match =11.2 

%Identity =29.9 %Similarity =52.7 

Matches = 135 Mismatches = 199 Conservative Sub.s = 103 

207 237 267 297 327 357 384 414 

FQWCLKFLHHHLRKRMLQIMETHQKMKHLKLINKR*RRGNIJ^LIPQYRGIKVKPYMRYMSYYL-FSFLFILFLTPVGV 

: || :::| : : ::: : |: :|| ||: 
MKQRGKRIRPSGKDLVFHFTIASLLPVFLLWGL 
10 20 30 

426 453 483 513 570 600 

y SYYYLDL-LKMMDKMSMISVGTGLFLAFFVSWYLTWFLQFJW-PLFNKLDRLKRMSKFLYENGYVYEKR 

: |: | ||: : : : :| :: : : :: :| ::::| : || : || 

FHWTIQQINWQDFNLSQADKIDIPYLIISFSVAILICLLVAFVFKRWYDTVTCQLYHRQKIiAKMILENKW-YESEQVKT 
50 60 70 80 90 100 110 

636 663 693 723 753 783 813 843 

KKSNKKTKTKYR- FPKVYVKO^KYDLSVSFEMAGGKFQKKFKDIGGELEDTFFMDFMEKTDDPRFKIYKLAYSAFL 

II =|| I MM = : : 1: 11=1 = = = =11 =:===! =111 

EGFFKDSAGRTKEKITYFPKMYYRLKNGLIQIRVBITLGKYQDQLLHLEKKLESGLYCELTDKELKDSYVEYTLLYDTIA 
130 140 150 160 170 180 190 

873 903 933 963 993 1020 1050 1080 

SRITVKDVIWNKDKGIKLMDGYYTOFINDPHLLVAG^ 

lll = : I = II = = 11 =1 = = I I = I = I I I I I I I I I = : = = = I I III II = = ll = 
SRISI-DEvEAKDGKLRLMKNWJWEYDKLPHMLIAGGTGGGKTYFILTLIEALLHTDSKLYILDPKNAD LADLGSVM 

210 220 230 240 250 260 
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1110 1140 1170 1200 1227 1257 1287 1317 

GRIAFEKADIIEKFENAVTIMFARYDBWNEMKRLGHKDMKKFYDY-GLEPYFFVCDEYNALMSSLSYQEREIVDNAFTQ 



5 



ANVYYRKEDLLSCIETFYEEMMKR- 
280 290 



SEEMKQMKNYKTGKNYAYLGLPAHFLIFDEYVAFMEMLGTKENTAVMNKLKQ 
300 310 320 330 340 



1347 1377 1407 1437 1467 1497 1527 1557 

YILLGRQVGCNAIIAMQKPSADDLPTKIRSNI«HHISVGRLDDGGYVMMFGDENRNKEFRFIKYLAGRRVYGRGYSAVFG 



10 



IVMLGRQAGFFLILACQRPDAKYLGDGIRDQFNFRVALGRMSEMGYGMMFGSD-VQKDF-FLKRIKGR GYVDVGT 

360 370 380 390 400 410 




1587 1617 1647 1677 1707 1737 1767 1797 

EVAREFYSPLLPKNFSFYDAFEKINRHENPFDPTENQEVSKAILKDESLREFVEKTSENELLKGSVGFDFDDEMEEMENM 



15 




430 440 450 460 



SEQ ID 8606 (GBS216) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
20 extract is shown in Figure 42 (lane 3; MW 66.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 2; MW 91kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 579 

25 A DNA sequence (GBSx0619) was identified in S.agalactiae <SEQ ID 1809> which encodes the amino 
acid sequence <SEQ ID 1810>. Analysis of this protein sequence reveals the following: 

Possible site: 14 



35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 580 

40 A DNA sequence (GBSx0620) was identified in S.agalactiae <SEQ ID 181 1> which encodes the amino 
acid sequence <SEQ ID 1812>. Analysis of this protein sequence reveals the following: 



>>> Seems to have no N-terminal signal sequence 



30 



Final Results 



bacterial cytoplasm Certainty=0. 4095 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Possible site: 28 

»> Seems to have no N-terminal signal sequence 



45 



Final Results 



bacterial cytoplasm Certainty=0 . 0944 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



50 



A related GBS nucleic acid sequence <SEQ ID 10219> which encodes amino acid sequence <SEQ ID 
10220> was also identified. 



WO 02/34771 



PCT/GB01/04789 



-664- 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

5 Example 581 

A DNA sequence (GBSx0621) was identified in S.agalactiae <SEQ ID 1813> which encodes the amino 
acid sequence <SEQ ID 1814>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -4.94 Transmembrane 810 - 826 ( 808 - 830) 

Final Results 

bacterial membrane Certainty=0. 2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:D90354 surface protein antigen precursor [Strept... 

20 >GP:BAA14368 GB:D90354 surface protein antigen precursor 

[Streptococcus sobrinus] 
Identities = 151/408 (37%) , Positives = 219/408 (53%) , Gaps = 27/408 (6%) 

Query: 451 PSKAVIDEAGQSWGKTVLPNAEIiNYVAKQDFSQYKGMTASQGKIAKNFVFIDDYKDDAL 510 
25 P K +E G ++GK+VL Y D QYKG +++ I K F ++DDY ++AL 

Sbjct: 1162 PHKVNKl^NGWIDGKSvLAGTTNYYELITOLDQYKGDKSAKETIQKGFFYVDDYPEEAL 1221 

Query: 511 DGKSMKVNSIKASDGTDVSQL-LEMRHVLSTOTIjDEKLQTLIKEAGISPVGEFYMWTAKD 569 
D ++ + IK +D + + + S + +Q ++K+A I+P G F ++TA D 

30 Sbjct: 1222 D LRTDLIKLTDANGKAVTGVSVADYASLEAAPAAVQDMLKKANITPKGAFQVFTADD 1278 

Query: 570 PQAFYKAYVQKGLDVTYNLSFKVKKEFTK- -GQIQNGVAQIDFGNGYTGNIWNDLTTPE 627 

PQAFY AYV G D+T VK E K G +N QIDFGNGY NIV+N++ 

Sbjct: 1279 PQAFYDAYVVTGTDLTIVTPMTVKAEMGKIGGSYENKAYQIDFGNGYESNIVINNVPQIN 1338 

35 

Query: 628 IHKDV---LDKEDGKSINNGTWLGDEVTYKLEGWVVPTGRSYDLFEYKFVDQLQRTPDL 684 

KDV +D D +++ T+ L Y+L G ++P + +LFEY F D +T D 

Sbjct: 1339 PEKDVTLTMDPADSTNvTJGQTIALNQVFNYRLIGGIIPADHAEELFEYSFSDDYDQTGDQ 1398 

40 Query: 685 YLRD-KWAKVDWLKDGWIKKGTNLGEYTETVYNKKTGLYELVFKKDFLEKVARSSEF 743 

Y K AKVD+TLKDGT+IK GT+L YTE ++ G + FK+DFL V+ S F 
Sbjct: 1399 YTGQYKAFAKVDLTLKDGTIIKAGTDLTSYTEAQVDEANGQIVVTFKEDFLRSVSVDSAF 1458 

Query: 744 GADDFVWKRI KAGDVYNTADFFINGNKVKTETWTHTPE - - KPKPVEPQ 791 

45 A+ ++ +KRI G NT +NG + TV T TPE +P PV+P+ 

Sbjct: 1459 QAEWLQMKRIAVGTFAOTYVNTVNGITYSSNTVRTSTPEPKQPSPVDPKTTTTWFQPR 1518 

Query: 792 - -KATPKAPAKG- -LPQTGEASVAPLTALGAIILSA-IGLAGFKKRKE 834 
KA AP G LP TG++S A L LG + L+A L G +++++ 
50 Sbjct: 1519 QGKAYQPAPPAGAQLPATGDSSNAYLPLLGLVSLTAGFSLLGLRRKQD 1566 

Identities = 75/242 (30%) , Positives = 120/242 (48%) , Gaps = 33/242 (13%) 

Query: 11 SADQWTQATTQTvTQNQAETVTSTQLDKAVATAKKAAVAVTTTAAVNHATTTDAQADLA 70 
S+ T+QA T + V++++LD+A +A++A VV+AVNT + DA 

55 Sbjct: 73 SSQAETSQAQAGQKTGAMSVDVSTSELDEAAKSAQEAGVTVSQDATVNKGTVETS--DEA 130 

Query: 71 NQTQT-VKDvTAKAQANTQAIKDATAENAKIDAENKAESQRVSQLNAQTKAKID AEN 126 

NQ +T +KD +K A+ 1+ T + A N+AE+ R++Q NA KA+ + AN 

Sbjct: 131 NQKETEIKDDYSKQAAD IQKTTEDYKAAVAANQAETDRITQENAAKKAQYEQDLAAN 187 

60 
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Query: 127 KDAQAKADATNAQLQKDYQAKLAKI KSVEAYNAGVRQRNKDAQA KA 172 

K + NAQ + DY+AKLA+ + A V+Q N' D+QA + 
Sbjct: 188 KAEVERITNENAQAKADYEAKLAQYQKDLA AVQQANNDSQAAYAAAKEAYDKELARV 244 

Query: 173 DATNAQLQKDYQAKLA LYNQALKAKAEftDKQSINNVAFDIKAQ AKGVDNAEYG 225 

A NA +K+Y+ LA N+ +KA+ A +Q D +A+ K + A+ G 

Sbjct: 245 QAANAAAKKEYEEALAAOTTKNEQIKAENA&IQQ 304 

Query: 226 NS 227 
N+ 

Sbjct: 305 NA 306 

Identities = 63/223 (28%) , Positives = 100/223 (44%) , Gaps = 31/223 (13%) 

Query: 2 ITTLQTSQVSADQVTTQATTQTVTQNQAETVTSTQLDKAVATAK KAAVA 50 

+ +Q + +A + +A T+N+ + + + A AK K A 

Sbjct: 241 LARVQAANAAAKKEYEEALAAtWTKNEQ I KAENAAI QQRNAQAKADYEAKLAQYEKDLAA 300 

Query: 51 VTTTAAvNHATTTDAQADIANQTQTVKDVTAKA-QANTQAIKDATAENAKIDAENKAESQ 109 

+ A N A +A + V+ A A QA QA+ TA+NA+I AEN+A Q 

Sbjct: 301 AQSGNATNE1ADYQAKKAAYEQELARVQAANAAAKQAYEQA1AAOTAKNAQITAENEAIQQ 360 

Query: 110 RVSQLNAQTKAKIDAENKDAQAKADATNAQLQKDYQAKLA KI KSVEAYNAGVRQRN 165 

R +Q A +AK+ KD A A + NA + DYQ KLA ++ V+A NA +Q 
Sbjct: 361 RNAQAKANYEAK1AQYQKDL-AAAQSGNAANEADYQEKLAAYEKELARVQAANAAAKQEY 419 

Query: 166 KDAQAKADATNAQL QKDYQAKLALYNQAL 194 

+ +A+A NA++ + DY+ KL+ Y + L 

Sbjct: 420 EQKVQEANAKNAE ITEANRAI RERNAKAKTDYELKLSKYQEEL 462 

Identities = 75/243 (30%) , Positives = 101/243 (40%) , Gaps = 56/243 (23%) 

Query: 8 SQVSAD - QWTQATTQTVTQNQAETVTSTQIiDKAVATAKKAAVAVTTTAAVNHATTTDAQ 66 

S+ +AD Q TT+ V NQAET TQ + A A+ A V T +AQ 

Sbjct: 142 SKQAADIQKTTEDYKAAVAANQAETDRITQ-ENAAKKAQYEQDLRANKAEVERITNENAQ 200 

Query: 67 ADL---ANQTQTVKDVTAKAQANT QAIK 91 

A A Q KD+ A QAN +A+ 

Sbjct: 201 AKADYEAKIAQYQKDLAAVQQANNDSQAAYAAWCEAYDKELARVQAANAAAKKEYEEAIA 260 

Query: 92 DATAENAKIDAENKAESQRVSQIJS[AQTKAKIDRE^^Q^AQAKADATNAQLQKDYQAKLA-- 149 

T +N +1 AEN A QR +Q A +AK+ KD A A + NA + DYQAK A 
Sbjct: 261 ANTTKNEQIKAENAAIQQRNAQAKADYEAKLAQYEKDL-AAAQSGNATNEADYQAKKAAY 319 

Query: 150 - - KIKSVEAYNAGVRQRNKDAQAKADATNAQL QKDYQAKLALYNQA 193 

++ V+A NA +Q + A A A NAQ+ + +Y+AKLA Y + 

Sbjct: 320 EQEIARVQAANAAAKQAYEQALAANTAKNAQITAENEAIQQRNAQAKANYEAKLAQYQKD 379 

Query: 194 LKA 196 
L A 

Sbjct: 380 LAA 382 

There is also homology to SEQ ID 598. 

SEQ ID 1814 (GBS191) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 176 (lane 2; MW 91kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 582 

A DNA sequence (GBSx0622) was identified in S.agalactiae <SEQ ID 1815> which encodes the amino 
acid sequence <SEQ ID 1816>. This protein is predicted to be TnpA. Analysis of this protein sequence 
reveals the following: 
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Possible site: 34 

»> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 2935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10221> which encodes amino acid sequence <SEQ ID 
1 0 1 0222> was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9921> which encodes amino acid sequence <SEQ ID 9922> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC82523 GB.-AF027768 TnpA [Serratia marcescens] 
15 Identities = 168/385 (43%) , Positives = 232/385 (59%) , Gaps = 13/385 (3%) 

Query: 26 MMFKVEAVGPPERCPECGFD-KLYKHSSRNQLINIDLPIRLKRVGLHIoNRRRYKCRECGST 84 

M F+V+ V P C ECG + + R+ DLPI KRV L + RRRY CR C +T 
Sbjct: 1 MHFQVD-VPDPIACEECGVQGEFVRFGKRDVPYRDLPIHGKRVTLWWRRRYTCRACKTT 59 

20 

Query: 85 IS VDEKRSMTKRLLKSIQEQSMSKTFVEVAESVGVDEKTIRNVFKDYVALKERE 138 

VD R MT RL + ++++S + + VA G+DEKT+R++F R 
Sbjct: 60 FRPQLPEMVDGFR-MTLRLHEYvEKESFNHPYTFVAAQTGLDEKTVRDIFNARAEFLGRW 118 

25 Query: 139 YQFETPKWLGIDEIHIIRRPRLVLTNIERRTIYDIKPNRNKETVIQRLSEISDRTYIEYV 198 

++FETP+ LGIDE+++ +R R +LTNIE RT+ D+ R. ++ V L ++ DR +E V 
Sbjct: 119 HRFETPRILGIDELYI^KRYRCILTNIEERTLLDLLATRRQDvvTNYLMKLKDRQKVEIV 178 

Query: 199 TMDMWKPYKDAVOTILPQAKOTVDKFHVVRMANQALDN^ 258 
30 +MDMW PY+ AV +LPQA++WDKFHWRMAN AL+ VRK L+ + + RTL +R 

Sbjct: 179 StffiMWNPYRAAvT^AVLPQARIVvDKFHVVRMANDALERTOKGLRKELKPSQSRTLKGDRK 238 

Query: 259 ILLKRKHDLNERESFLLDTWLGNLPALKEAYELKEEFYWIWDTPDPDEGHLRYSQWRHRC 318 
ILLKR H++++RE +++TW GPL AYE KE FY I WD + +W 

35 Sbjct: 239 ILLKRAHEVSDRERLIMETWTGAFPQLLAAYEHKERFYGIWDATTRLQAEAALDEWI-AT 297 

Query: 319 MSSNSKDAYKDLVRAVDNWHVEIFNYF- -DKRLTNAYTESINSIIRQVERMGRGYSFDAL 376 

+ K+ + DLVRAV NW E YF D +TNAYTESIN + + R GRGYSF+ + 
Sbjct: 298 I PKGQKEVWSDLVRAVGNWREETMTYFETDMPVTNAYTES INRLAKDKNREGRGYSFE VM 357 



Query: 377 RAKILFNEKLHKKRKPRFNSSAFNK 401 

RA++L+ K HKK+ P S F K 
Sbjct: 358 RARMLYTTK- HKKKAPTAKVS PF YK 381 



45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 583 

A DNA sequence (GBSx0623) was identified in S.agalactiae <SEQ ID 1817> which encodes the amino 
50 acid sequence <SEQ ID 1818>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 



55 



Final Results 

bacterial cytoplasm Certainty=0 .2115 (Affirmative) < suco 



WO 02/34771 



PCT/GB01/04789 



-667- 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA70224 GB:Y09024 mercuric reductase [Bacillus cereus] 
Identities = 411/546 (75%) , Positives = 483/546 (88%) 



Query: 


1 


MNKFKVNI SGMTCTGCEKHVESALEKIGAKNIESSYRRGEAVFELPDD IEVESAI KAI DE 


60 






M K++V++ GMTCTGCE+HV ALE +GA IE +RRGEAVFELP+ + VE+A KAI + 




Sb j ct : 


1 


MKKyRVDVQGMTCTGCEEHVAVALENMGATGIEVDFRRGEAVFEL 


60 


Query: 


61 


ANYQAGEIEEVSSLENVALINEDNYDLLIIGSGAAAFSSAIKAIEYGAKVGMIERGTVGG 


120 






A YQ G+ EEV S E V L NE +YD +IIGSG AAFSSAI+A++YGAKV MIERGT+GG 




Sb j ct : 


61 


AKYQPGKAEEVQSQEMVQLGNEGDYDYIIIGSGGAAFSSAIEAVKYGAKVAMIERGTIGG 


120 


Query: 


121 


TCVNIGCTPSKTLLRAGEINHLSKDNPFIGLQTSAGEVDLASLITQKDKLVSELRNQKYM 


180 






TCVNIGCVPSKTLLRAGEINHL+K+NPF+GL TSAGEVDLA LI QK++LV+ELRN KY+ 




Sb j ct : 


121 


TCVNIGCVPSKTLLRAGEINHLAKNNPFVGLHTSAGEVDLAPLIKQKNELOTELRNSKYV 


180 


Query: 


181 


DLIDEYNFDLIKGEAKFVDASTVEVNGTKLSAKRFLIATGASPSLPQISGLEKMDYLTST 


240 






DLID+Y F+LI +GEAKFVD TVEVNG +SAKRFLIATGASP+ P I GL ++DYLTST 




Sb j ct : 


181 


DLIDDYGFELIEGEAKFVDEKTVEVNGAPISAKRFLIATGASPAKPNIPGIiNEVDYLTST 


240 


Query: 


241 


TLLELKKIPKRLTVIGSGYIGMELGQLFHHLGSEITLMQRSERLLKEYDPEISESVEKAL 


300 






+LLELKK+PKRL VIGSGYIGMELGQLFH+LGSE+TL+QRSERLLKEYDPEISESVEK+L 




Sb j ct : 


241 


SLLELKKVPKRLWIGSGYIGMELGQLFHNLGSEVTLIQRSERLLKEYDPEISESVEKSL 


300 


Query: 


301 


IEQGINLWGATFERVEQSGEIKRVYVTVNGSREVIESDQLLVATGRKPNTDSLNLSAAG 


360 






+EQGINLVKGAT+ER+EQ+G+IK+V+V VNG + +IE+DQLLVATGR PNT +LNL AAG 




Sb j ct : 


301 


VEQjSINLVKGATYERIEQNGDIKKVHVEWGKKRIIEADQLLVATGRTPNTATLNLRAAG 


360 


Query: 


361 


VETGKiMNE IL INDi Gy 1 oNEKI YAAGD VTLGPQ FVYVAAYEGG 111 DNAIGGLNKKlDXiS 


420 






VE G EI+I+D+ +T+N +IYAAGDVTLGPQFVYVAAY+GG+ NAIGGLNKK++L 




Sbjct: 


361 


VEIGSRGEIIIDDYSRTTlWRIYAAGDVTLGPQFvYVAAYQGGVAAPNAIGGLNKKLNLE 


420 


Query: 


421 


WPAVTFTNPTVATVGLTEEQAKEKGYDVKTSVLPLGAVPRAIVNRETTGVFKLVADAET 


480 






WP VTFT P +ATVGLTE+QAKE GY+VKTSVLPL AVPRA+ VNRETTGVFKLVAD+ +T 




Sbjct: 


421 


WPGVTFTAPAIATVGLTEQQAKENGYEVKTSVLPLDAVPRALVNRETTGVFKLVADSKT 


480 


Query: 


481 


LKVLGVHIVSENAGDVIYAASLAVKFGLTIEDLTETLAPYLTMAEGLKLVALTFDKDISK 


540 






+KVLG H+V+ENAGDVI YAA+LAVKFGLT+ +D+ ETLAPYLTMAEGLKL ALTFDKDISK 




Sbjct: 


481 


MKVLGAHWAENAGDVIYAATLAVKFGLTVDDIRETLAPYLTMAEGLKLAALTFDKDISK 


540 


Query: 


541 


LSCCAG 546 








LSCCAG 




Sbjct: 


541 


LSCCAG 546 





There is also homology to SEQ ID 1 820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 584 

A DNA sequence (GBSx0624) was identified in S.agalactiae <SEQ ID 1821> which encodes the amino 
acid sequence <SEQ ID 1822>. This protein is predicted to be regulatory protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4529 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA83973 GB:AF138877 mercury resistance operon negative 
regulator MerRl [Bacillus sp. RC607] 
Identities = 84/129 (65%) , Positives = 105/129 (81%) 

Query: 1 MIYRISEFADKCGVNKETIRYYERKNLLQEPHRTEAGYRIYSYDDVKRVGFIKRIQELGF 60 

M +RI E ADKCGVNKETIRYYER L+ EP RTE GYR+YS V R+ FIKR+QELGF 
Sbjct: 1 MKFRIGELADKCGVNKETIRYYERLGLIPEPERTEKGYRMYSQQTVDRLHFIKRMQELGF 60 

Query: 61 SLSEIYKLLGWDKDEVRCQDMFEFVSKKQKEVQKQIEDLKRIETMLDDLKQRCPDEKKL 120 

+L+EI KLLGWD+DE +C+DM++F K +++Q++IEDLKRIE ML DLK+RCP+ K + 
Sbjct: 61 TLNEIDKLLGWDRDEAKCRDMYDFTILKIEDIQRKIEDLKRIERMLMDLKERCPENKDI 120 

Query: 121 HSCPIIETL 129 

+ CPIIETL 
Sbjct: 121 YECPIIETL 129 

There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 585 

A DNA sequence (GBSx0625) was identified in S.agalactiae <SEQ ID 1823> which encodes the amino 
acid sequence <SEQ ID 1824>. This protein is predicted to be Nramp metal ion transporter. Analysis of this 
protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




13. 


85 


Transmembrane 


175 


- 191 


( 


169 


- 201) 


INTEGRAL 


Likelihood 




•11. 


.94 


Transmembrane 


150 


- 166 


( 


132 


- 173) 


INTEGRAL 


Likelihood 




-9. 


.45 


Transmembrane 


491 


- 507 


( 


481 


- 509) 


INTEGRAL 


Likelihood 




-8. 


,92 


Transmembrane 


375 


- 391 


( 


374 


- 396) 


INTEGRAL 


Likelihood 




-8. 


.39 


Transmembrane 


72 


- 88 


( 


69 


- 93) 


INTEGRAL 


Likelihood 




-7. 


.96 


Transmembrane 


280 


- 296 


( 


274 


- 299) 


INTEGRAL 


Likelihood 




-7. 


.17 


Transmembrane 


413 


- 429 


( 


411 


- 431) 


INTEGRAL 


Likelihood 




-6. 


.79 


Transmembrane 


327 


- 343 


( 


322 


- 346) 


INTEGRAL 


Likelihood 




-3.40 


Transmembrane 


444 


- 460 


( 


443 


- 462) 


INTEGRAL 


Likelihood 




-3 


.24 


Transmembrane 


132 


- 148 


( 


132 


- 149) 


INTEGRAL 


Likelihood 




-0 


.96 


Transmembrane 


115 


- 131 


( 


114 


- 131) 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF83825 GB:AE003939 manganese transport protein [Xylella 
fastidiosa] 

Identities = 185/450 (41%) , Positives = 278/450 (61%) , Gaps = 29/450 (6%) 

Query: 16 ANGPSLEEINGTIEVPKDLSFFKTLLAYSGPGALVAVGYMDPGNWSTSITGGQNFQYLLI 75 

++ PSL E++ ++ V + + LLA+ GPG +V+VGYMDPGNW+T + GG F Y+L+ 
Sbjct: 35 SDSPSLGEMHASVAVSRRGHWGFRLLAFLGPGYMVSVGYMDPGNWATGLAGGSRFGYMLL 94 

Query: 76 SIILMSSLIAMLLQYMSAKLGIVTQMDLAQAIRARTSKQLGIVLWILTELAIMATDIAEV 135 

S+IL+S+++A++LQ ++A+LGI + MDLAQA RAR S+ + LW++ ELAI+A D+AEV 
Sbjct: 95 SVILLSNVMAIVLQALARRLGIASDMDLAQACRARYSRGTTLALWWCELAIIACDLAEV 154 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 6540 (Affirmative) < suco 
— Certainty=0 . 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



Query: 136 IGGAIALYLLFHIPLAIAVFITVFDVLLLLLLTKIGFRKIEALWALILVIFLVFAYQVA 195 
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IG AIAL LL +P+ V IT DV+L+LLL GFR +EA V+AL+LVIF F Q+ 




Sbj Ct : 


155 


IGTAIALNLLLGVPIIWGWITAVDVVLVLLLMHRGFRALEAFVIALLLVIFGCFWQIV 


214 


Query: 


196 


LSHPIWTDIFKGLVPTSEAFSTSHTVNGQTPLSGALGIIGATVMPHNLYLHSSWQSRKL 


255 






L+ P ++ G VP + V L A+GI+GATVMPHNLYLHSS+VQ+R 




Sbjct: 


215 


LAAPPLQEVLGGFVPRWQ WADPQALYLAIGIVGATVMPHNLYIiHSS IVQTRAY 


268 


Query: 


256 


DHNNKKDIAR--AIRFSTFDSNIQLTVAFFVNSLLLIMGVAVFKTGSVTDPSFFGLFKAL 


313 






+ + R A+R++ DS + L +A F+N+ +LI+ AVF D 




Sbj ct : 


269 


P RTPVGRRSALRWAVADSTLALMLALFINASILILAAAVFHAQHHFD 


315 


Query: 


314 


SNSTIMSNSILAHIASSGILSLLFAIALIASGQNSTITGTLTGQIIMEGFIHMKVPIWFR 


373 






+ +LA + G+ + IiFA ALLASG NST+T TL GQI+MEGF+ +++ W R 




Sbjct: 


316 


VEEIEQAYQLLAPVLGVGVAATLFATALLASGINSTVTATLAGQIVMEGFLRLRLRPWLR 


375 


Query: 


374 


RIITRLISVIPVMICVLVTSGRSTVEEHIAINNLMNNSQVFLAFALPFSMLPLLIFTNSK 


433 






R++TR ++++PV++ V + + T L+ SQV L+ LPF+++PLL + 




Sbj ct : 


376 


RVLTRGLAIVPVIVWALYGEQGT GRLLLLSQVI LSMQLPFAVI PLLRCVADR 


428 


Query: 


434 


VEMDDDFKNTWIIKILGWLSVIGLIYLNMK 463 








M W++ ++ WL ++ LN+K 




Sbj ct: 


429 


KVMGALVAPRWLM-VVAWLIAGVIVVLNVK 457 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 586 

A DNA sequence (GBSx0626) was identified in S.agalactiae <SEQ ID 1825> which encodes the amino 
acid sequence <SEQ ID 1826>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2590 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 587 

A DNA sequence (GBSx0627) was identified in S.agalactiae <SEQ ID 1827> which encodes the amino 
acid sequence <SEQ ID 1828>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 
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94 
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( 


122 
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-6. 


.42 


Transmembrane 


159 - 


175 


( 


155 


- 188) 
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Likelihood 




-4, 


,78 


Transmembrane 


54 - 


70 


( 


51 


- 72) 
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-2 


.97 


Transmembrane 


18 - 


34 


( 


15 


- 36) 



Final Results 
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bacterial membrane Certainty=0 .4927 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16051 GB:Z99124 yydj [Bacillus subtilis] 
Identities = 97/239 (40%) , Positives = 154/239 (63%) , Gaps = 3/239 (1%) 

Query: 4 LEFRKSIRGRTLFYI1STVALTYVLGYILPVGIDKIRHLTLGEFYFSTYTVFTQFGFLIF 63 
10 LEF+KSI + + + + ++LGY L VGIDK+ ++T F+FS+YTV TQFG ++F 

Sbjct: 3 LEFKKSI SNKVI I ILGAMFVFLFLLGYFLLVGIDKVSNVTPEMFFFSSYTOATQFGLMLF 62 

Query: 64 GFVIVYFFNKDYSDKCILYHYFSGYHLTKYFYTKLLVLFSEFFIAIIVCNILASLLWGYS 123 
FVI +F N++YS+K IL++ G ++ +FY K+ VLF E F 1 + ++ SL++ + 
15 Sbjct: 63 SFVIAFFINREYSNKNILFYKLIGENIYTFFYKKIA\7LFLECFAFITLGLLIISLMY-HD 121 

Query: 124 LFYFLTTTILFSLWLQYLLWSTISILFSNMLVSIGVTIFYWITSIILVAIGG-IFKVS 182 

+F LFS V+LQY+L++ TIS+L N+L+SIGV+I YW+TS+ILVAI F 

Sbjct: 122 FSHFALLLFLFSAVILQYILIIGTISVLCPNILISIGVSIVYWMTSVILVAISNKTFGFI 181 

20 

Query: 183 Al FDASNSLYKI I GK- LFSHPMTIDLTDFFI I VPYMI CLSVI SFLI VCLSNRRWLIiNGM^ 240 

A F+A N++Y I + L S MT+ D 1+ Y++ + +1+ +++ S RW+ G+" 
Sbjct: 182 APFEAGNTMYPRIERVLQSDNMTLGSND VLFIILYLVSI 1 1 INAIVLRFSKTRWIKMGL 240 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 588 

A DNA sequence (GBSx0628) was identified in S.agalactiae <SEQ ID 1829> which encodes the amino 
30 acid sequence <SEQ ID 1830>. This protein is predicted to be antibiotic epidermin immunity protein F. 
Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2901 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16052 GB:Z99124 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 100/209 (47%) , Positives = 150/209 (70%) , Gaps = 4/209 (1%) 

45 Query: 1 MFINNYTLKIGNRILLENTNLDFEEGEINHLLGRNGSGKSQLAKDFIINRGNYFSNDIYE 60 

M I NYTLK+ + LL++T+L F G+INH++G+NG GKSQLAKDF++N DI + 

Sbjct: 1 MNIANyTLKVKGKTLLQDTDLHFSSGKINHWGKNGVGKSQIAKDFLLNNSKRIGRDIRQ 60 

Query: 61 DTLIISSYSNLPSDVT INDLERTIPWKLSKEIYQLLNINQISKTVKLKQLSDGQKQ 116 

50 + +ISS SN+P+DV+ ++ L + K+ +1 LWI++ I V +K LSDGQKQ 

Sbjct: 61 NVSLISSSSNIPNDVSKDFLLHFLSKKFDAKMIDKIAYLLNLDNIDGKVLIKNLSDGQKQ 120 

Query: 117 KVKLLVIjLSLDKHIIILDEITNALDKKSVDEINVFLQNYIQYYPEKIIINISHDINNIRS 176 
K+KLL L DK+ I I +LDE I TN+LDKK+V EI+ FL YIQ PEKIIINI+HD++++++ 
55 Sbjct: 121 KLKLLSFLLEDKNIIVIjDEITNSLDKICIVIEIHGFIjNICYIQENPEKIIINITHDLSDLKA 180 

Query: 177 LKGNYFLIDNQKI CKVDTLDDAI SWYLGE 205 

++G+Y++ ++Q+I + ++D I Y+ E 
Sbjct: 181 IEGDYYIFNHQEIQQYHSVDKLIEVYINE 209 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 1831> which encodes the amino acid 
sequence <SEQ ID 1832>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2760 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 49/174 (28%) , Positives = 82/174 (46%) , Gaps = 27/174 (15%) 

15 Query: 3 INNYTLKIGJSRILLENTNLDFEEGEINHLLGRNGSGKSQLAK DFIINRGN 52 

IN G R +L N N++ +G++ L+G NG+GKS + K II G 

Sbjct: 23 IQNLKKSYGKRTILNNVNMNI PKGKVYALIGPNGAGKSTIMKILTGLVSKTSGS I I FEGR 82 

Query: .53 YFS NDIYEDTLI ISSYSNLPSDVTINDL-ERTIPWKLSKEIYQLIiNINQI 101 

20 +S I E+ + +S+Y N+ T+ + E TI L+K + + I 

Sbjct: 83 EWSRRDLRKIGSIIEEPPLYKNLSAYDNMKWTTMLGVSESTILPLLNK VGLGNI 137 

Query: 102 SKTVKLKQLSDGQKQKVKLL VLLSLDKHI 1 1 LDE ITNALDKKS VDE INVFLQNY 155 
K +KQ S G KQ++ + + L ++ILDE TN LD + E+ ++++ 

25 Sbjct: 138 DKR-PVKQFSLGMKQRLGIAISLINSPKLLILDEPTNGLDPIGIQELREIIESF 190 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 589 

30 A DNA sequence (GBSx0629) was identified in S.agalactiae <SEQ ID 1833> which encodes the amino 
acid sequence <SEQ ID 1834>. This protein is predicted to be aminoglycoside 6-adenylyltransferase. 
Analysis of this protein sequence reveals the following: 



35 



40 



Possible site: 33 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1780 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA29839 GB.-X06627 ORF (str) [Staphylococcus aureus] 
Identities = 91/289 (31%) , Positives = 146/289 (50%) , Gaps = 14/289 (4%) 

45 Query: 1 MRDEQEIYNLVLNIANQDKRIEAVLLNGSRANPNVPKDDFQDYDIVFV'rNFIEDIISDTN 60 

MR E+EI NLV A Q ++ + L GSR N N+ KD FQDYD F + IE + + 
Sbjct: 1 MRTEKEILNLVSEFAYQRSNVKIIALEGSRTNENIKKDKFQDYDFAFFVSDIEYFTHEES 60 

Query: 61 YHKKFGDILIMQKPNE FRNKTEYNCFAYLMQFQDLTRIDLRLIKPEFLEDYLDDA- - 115 

50 + FG++L +QKP + F +Y ++Y+M F+D ++D+ LI + L Y D+ 

Sbjct: 61 WLSLFGELLFIQKPEDMELFPPDLDYG-YSYIMYFKDGIKMDITLINLKDLNRYFSDSDG 119 

Query: 116 FSKVlLDKKNKyLDYNFERSSLYETKQLSEDEINOLNEIYWVSTYVVKGIARNDIIYSE 175 
K+L+DK N S Y K+ +E E NE + VSTYV KG+ R +I+++ 

55 Sbjct: 120 LVKILVBKDISnjVTQEIVPDDSNYWLKKPTEREFYDCCNEFWSVSTYVAKGVFRREILFAL 179 



Query: 



176 FMISNPIKNAFIKLLKQKILIEKELDSLSFGKLDKDILQYITDKD--QLLKIFSNKSLKD 233 
+N ++ ++++ I + D S GK K I +Y+TDK+ LL F + 
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Sbjct: 180 DHFNNILRPELLRMISWYIGFNRGFD-FSLGKNYKFINKYLTDKEFNMLLATFEMNGYRK 238 



Query: 234 IEftNLRFLLDETNQMAKyiSINRKLNLNQGEYQSAMKFMNIFLSNSYQN 282 
++ ++KYSN+ L Y+K+F+ N+Y+N 



Sbjct: 239 TYQSFKLCC ELFKYYS -NKVSCLGNYNYPNYEKNIENFIRNNYEN 282 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8607> and protein <SEQ ID 8608> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: -5.26 
GvH: Signal Score (-7.5): -6.14 

Possible site: 33 
>>> Seems to have no N- terminal signal sequence 
ALOM program count: 0 value: 6.10 threshold: 0.0 
PERIPHERAL Likelihood = 6.10 151 
modified ALOM score: -1.72 

*** Reasoning Step: 3 



The protein has homology with the following sequences in the databases: 

31.0/53.4% over 281aa 
Staphylococcus aureus 
EGAD 1 9462 | streptomycin resistance protein Insert characterized 
SP|P12055|STR_STAAU STREPTOMYCIN RESISTANCE PROTEIN. Insert characterized 
GP|46644|emb|CAA29839.l| |X06627 ORF (str) Insert characterized 
PIR|S00938 |S00938 str protein - plasmid pS194 Insert characterized 

ORF00399(301 - 1146 of 1452) 

EGAD|9462 | 9267 (1 - 282 of 282) streptomycin resistance protein {Staphylococcus aureus} 
SP|P12055|STR_STAAU STREPTOMYCIN RESISTANCE PROTEIN. GP | 46644 | emb | CAA29839 . 1 1 |X06627 ORF 
(str) {Staphylococcus aureus} PIR| S00938 1 S00938 str protein - Staphylococcus aureus plasmid 
pS194 

%Match =12.8 

%Identity = 31.0 %Similarity =53.4 

Matches = 87 Mismatches = 125 Conservative Sub.s = 63 

117 147 177 207 237 267 297 327 

* *LMTY*H*TVENIWNHNQLLRKI *N* ILGGRKGMSMLI * VYDYMLREKYKGNIKVLEXTW* YKVK*EVAIMRDEQEI YN 



Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0 . 1780 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



MRTEKEILN 



357 387 417 447 477 507 558 

LVLNIANQDKRIEAVLMGSRANPOTPKDDFQDYDIVFVTNFIEDIISDTNYHKKFGDILIMQKPNEFR- - -NKTEYNCF 



LVSEFAYQRSNVKI IALEGSRTNENIKKDKFQDYDFAFFVSDIEYFTHEESWLSLFGELLFIQKPEDMELFPPDLDYG-Y 
20 30 40 50 60 70 80 



588 618 672 702 732 762 792 

AYLMQFQDLTRIDLRLIKPEFLEDYLDDA--FSKVLLDKl<NKYLDYNFERSSLYETKQLSEDEINKIIiNEIYWSTYVVK 



SYIMYFKDGIKMDITLINLKDIjNRYFSDSDGLVKILVDKDNLVTQ 

100 110 120 130 140 150 160 



822 852 882 912 942 966 996 1026 

GIARNDIIYSEFMISNPIKNAFIKLLKQKILIEKELDSLSFGKLDKDILQYITDKD--QLLKIFSNKSLKDIEANLRFLL 



gvfrreilfaldhfnnilrpellrmiswyigfnrgfd-fslgknykfinkyltdkefnmllatfemngyrktyqsfklcc 

180 190 200 210 220 230 240 
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1056 1086 1116 1146 1176 1206 1236 1266 

DETNQMAKYISINRKIjNLNOGEYQSAMKFIW^ 

: :| I I = I = h 1=1=1 

5 ELFKYYSNKVS CLGNYNYPNYEKNIENFIRNNYEN 

260 270 280 

SEQ ID 1834 (GBS46) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 11 (lane 6; MW 34.9kDa). It was also expressed in E.coli as a GST-fusion 
1 0 product. SDS-PAGE analysis of total cell extract is shown in Figure 1 6 (lane 3; MW 59.8kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 590 

A DNA sequence (GBSx0630) was identified in S.agalactiae <SEQ ID 1835> which encodes the amino 
15 acid sequence <SEQ ID 1836>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 591 

A DNA sequence (GBSx0631) was identified in S.agalactiae <SEQ ID 1837> which encodes the amino 
30 acid sequence <SEQ ID 1838>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.81 Transmembrane 177 - 193 ( 177 - 194) 
INTEGRAL Likelihood = -0.27 Transmembrane 129 - 145 ( 129 - 145) 

35 



40 



Final Results 

bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8609> which encodes amino acid sequence <SEQ ID 8610> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -19.59 
45 GvH: Signal Score (-7.5): -4.49 

Possible site: 44 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -2.81 threshold: 0.0 

INTEGRAL Likelihood = -2.81 Transmembrane 172 - 188 ( 172 - 189) 
50 INTEGRAL Likelihood = -0.27 Transmembrane 124 - 140 ( 124 - 140) 

PERIPHERAL Likelihood =8.01 30 
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modified ALOM score: 1.06 
*** Reasoning Step: 3 



30 



5 Final Results 

bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 592 

15 A DNA sequence (GBSx0632) was identified in S.agalactiae <SEQ ID 1839> which encodes the amino 
acid sequence <SEQ ID 1840>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have an uncleavable N-term signal seq 

20 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 A related GBS nucleic acid sequence <SEQ ID 10223> which encodes amino acid sequence <SEQ ID 
10224> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:CAB49414 GB:AJ248284 hypothetical protein [Pyrococcus abyssi] 
Identities = 29/86 (33%) , Positives = 52/86 (59%) , Gaps = 4/86 (4%) 

Query: 14 TYYILLALFE--EAHGYAIMQKVEEMSGGDVRIAAGTMYGAIENLLKQKWIKSIPSD--D 69 

+Y ILL L E + HGYAI +++EE++ G + + G +Y ++ L K K ++ ++ 
Sbjct: 19 SYLILLIIjNENEKLHGYAIRKRLEELTDGKLVPSEGALYSIIjKMLKKYKLVEDYWAEVGG 78 



35 Query: 70 RRRKVYI ITETGKEIVELETNRLRKL 95 

R R+ Y ITE GKE+++ +R++ 
Sbjct: 79 RVRRYYQITELGKEVLDEIKEEIREI 104 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 593 

A DNA sequence (GBSx0633) was identified in S.agalactiae <SEQ ID 1841> which encodes the amino 
acid sequence <SEQ ID 1842>. Analysis of this protein sequence reveals the following: 

45 Possible site: 23 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0510 (Affirmative) < suco 

50 bacterial membrane Certainty=0.o000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



-675- 



PCT/GB01/04789 



A related GBS nucleic acid sequence <SEQ ID 10225> which encodes amino acid sequence <SEQ ID 
10226> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

5 >GP:AAF22299 GB:AF185571 putative N-acetyltransf erase Camello 2 

[Homo sapiens] 

Identities = 32/110 (29%) , Positives = 54/110 (49%) , Gaps = 4/110 (3%) 

Query: 67 IKMAEQDDIFQIEIOTQIffiKGQ-FWIALENEKWGSIALLRIDDKTAVLKKFFTyPKYRG 125 
10 + +A + D+ I Y + G FW+A EKOTG++ L +DD T K+ + 

Sbjct: 86 VDIALRTDMSDITKSYLSECGSCFWVAESEEKVVGTVGALPvDDPTLREKRLQLFHLSVD 145 

Query: 126 NPVR LGRKLFERFMLFARASKFTRIVLDTPEKEKRSHFFYENQGFKQ 172 

N R + + L + FAR ++ +VLDT + + Y++ GFK+ 
15 Sbjct: 146 NEHRGQGIAKALVRTVLQFARDQGYSEWLDTSNIQLSAMGLYQSLGFKK 195 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 594 

A DNA sequence (GBSx0634) was identified in S.agalactiae <SEQ ID 1843> which encodes the amino 
acid sequence <SEQ ID 1844>. Analysis of this protein sequence reveals the following: 



25 



30 



35 



Possible site: 47 

»> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 


=-11. 


.94 


Transmembrane 


159 - 


175 


( 


151 


- 180) 


INTEGRAL 


Likelihood 


= -ll. 


,62 


Transmembrane 


231 - 


247 


( 


225 


- 251) 


INTEGRAL 


Likelihood 


= -9 


.98 


Transmembrane 


182 - 


198 


( 


177 


- 203) 


INTEGRAL 


Likelihood 


= -7. 


.11 


Transmembrane 


118 - 


134 


( 


106 


- 136) 


INTEGRAL 


Likelihood 


= -1. 


.49 


Transmembrane 


74 - 


90 


( 


74 


- 93) 



Final Results 

bacterial membrane Certainty=0 . 5776 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10227> which encodes amino acid sequence <SEQ ID 
10228> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15891 GB:Z99123 yxlG [Bacillus subtilis] 
40 Identities = 42/188 (22%) , Positives = 94/188 (49%) , Gaps = 4/188 (2%) 

Query: 1 MKSLAVMLKKEWMENVRTYKVISILITCSIFGILGPLTALMMPDIMA- -GILPKKLQGAI 58 

MK + +L+KEW+E ++ K+I + 1 I G+ PLT MP+I+A G LP ++ + 
Sbjct: 1 MKVIWALLQKEWLEGWKSGKLIWLPIAMMIVGLTQPLTIYYMPEIIAHGGNLPDGMKISF 60 

45 

Query: 59 PEPTYIDSYIQYFKNMNQLGLVILVFLFSSTLTQEFSKGTLINLVTKGLAKKVIILAKFI 118 

P+ + + N LG+ +++F ++ E ++G ++++ + I++K++ 

Sbjct: 61 TMPSGSEVWSTLSQFNTLGMALVIFSvMGSvANERNQGVTALIMSRPvTAAHYIVSKWL 120 

50 Query: 119 VITLLWTVSYLLSWIHFSYTLYYFSNEGSHKLMVYGATWFIGILFI - SLILFFSVLFRK 177 

+ +++ +S+ ++Y F+ + ++ ++FI + L S +FR 

Sbjct: 121 IQSVIGIMSFAAGYGLAYYYVRLLFEDASFSRFAASLGLYALWVIFI VTAGLAGSTIFR- 179 



55 



Query: 178 TLGGLLGC 185 

++G C 
Sbjct: 180 SVGAAAAC 187 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 595 

A DNA sequence (GBSx0635) was identified in S.agalactiae <SEQ ID 1845> which encodes the amino 
acid sequence <SEQ ID 1846>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3431 (Affirmative) < suco 

bacterial membrane Certainty=G . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10229> which encodes amino acid sequence <SEQ ID 
1023O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12736 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 105/299 (35%) , Positives = 175/299 (58%) , Gaps = 11/299 (3%) 



Query: 


4 


ISFQNVTKSFGPKKILNNVSPDLEENMIYGFVGPNGAGKTTTIKMILGLLKFDTGFITIF 


63 






+ +NVTK+ + I++++SF + E ++GF+GPNGAGKTTTI+M++GL+K G + I 




Sbjct: 


5 


LELKNVTKNIRGRTIIbDLSFTIREGEVFGFLGPNGAGKTTTIR^WGLMKLSKGDVLIC 


64 


Query: 


64 


GKKVNFGRTDTNQLIGYLPDVPEYYDYMTALEYLDLCSGLARSKHKLSNKELLRSVGLDD 


123 






G+ + + IG + + PE Y +++ + L + + + K E++ VGL D 




Sb j ct : 


65 


GQSITKEYAKAIKHIGAI VENPELYKFLSGYKNLQQFARMVXGVTKEKIDEVVELVGLTD 


124 


Query: 


124 


N-HQKIATYSRGMKQRLGLAQALVHDPK1IICDEPTSALDPKGRQDILDIISNLRGEK-- 


180 






H K+ TYS GM+QRLGLAQ L+HDPK++I DEPT+ LDP G ++I D + L E+ 




Sb j ct : 


125 


RIHDKVKTYSLGMRQRLGLA.QCLLHDPKVLILDEPTNGLDPAGIREIRDHLKKLTRERGM 


184 


Query: 


181 


TVIFSTHILSDVEKICDHVLVLTKCGIYSLEELKGKKSEENYSTOILIKVTKSEAKVLSH 


240 






VI S+H+LS++E +CD + +L K + ++ +K + +EN + ++ SEA + + 




Sb j ct : 


185 


AVIVSSHLLSEMELMCDRIAILQKGKLIDIQNVKDENIDENDTYFFQVE-QPSEAATVLN 


243 


Query: 


241 


NYQIEKKDNEYALTLKGSKMDNKADLLAGFYQDLVSLKISPSAIEVIDNSLEELYLEVT 299 






Y + K N + L ++ +L LV +1 ++VI SLE+ +LE+T 




Sb j ct : 


244 


QYDLLSKTNGVEI KLAKEEVPAVI EL : LVMQQIRIYEVKVITKSLEDRFLEMT 295 



There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 596 

A DNA sequence (GBSx0636) was identified in S.agalactiae <SEQ ID 1847> which encodes the amino 
acid sequence <SEQ ID 1848>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4040 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB71491 GB:U53767 0RF6 [Bacillus pumilus] 
Identities = 39/134 (29%) , Positives = 71/134 (52%) , Gaps = 16/134 (11%) 

10 Query: 2 LGENIYLQRTQIGMTQENLSDYLHLTKTTISKWENNQAKPDIDYLILMANLFDISLDDLV 61 

LG NI +R + ++QE +++ L +++ ISKWE NQ++P +D LI +A LFD + +LV 
Sbjct: 4 LGSNISNKRKSLKLSQEYVAEQLGVSRQAISKMETNQSEPSMDNLIRIAELFDSDIKELV 63 

Query: 62 GYQKTLSDDQRNQLIKDLKIKANVLSERDFFQEVKELSKQFPNDFKTLLIMINM--VLSN 119 
15 S +Q ++ KDL+ + K++ Q F +L++I+ + 

Sbjct: 64 SPEQYSEEQKDLETRIE HGQKDIKMQMSAVFGRILMLISFFGYIGA 109 

Query: 120 LTNLNDSEMKEWSL 133 
L +L+ ++ W L 
20 Sbjct: 110 LFDLSSYQLPIWXL 123 

There is also homology to SEQ ID 1740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 597 

A DNA sequence (GBSx0637) was identified in S.agalactiae <SEQ ID 1849> which encodes the amino 
acid sequence <SEQ ID 1850>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have an uncleavable N-term signal seq 
30 INTEGRAL Likelihood =-13.59 Transmembrane 152 - 168 ( 145 - 173) 

Transmembrane 7 - 23 ( 3 - 27) 
Transmembrane 125 - 141 ( 122 - 146) 
Transmembrane 85 - 101 ( 83 - 102) 
Transmembrane 55 - 71 ( 54 - 75) 

35 



40 



INTEGRAL 


Likelihood 


=-13. 


,59 


INTEGRAL 


Likelihood 


= -9. 


.71 


INTEGRAL 


Likelihood 


= -6. 


.95 


INTEGRAL 


Likelihood 


= -4. 


.51 


INTEGRAL 


Likelihood 


= -3. 


.35 


— Final 


Results 







bacterial membrane Certainty=0 . 6434 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA79986 GB:Z21972 0RF2 [Bacillus megaterium] 
Identities = 51/186 (27%) , Positives = 106/186 (56%) , Gaps = 5/186 (2%) 

45 Query: 5 SFFQCVILLVSFLVLTLAVKSQSDMISYLDNITSAFFQSIRNPDLTNLMTIISTWSPLT 64 

+F V+ L+ F + + S ++ + + +++ S Q +P LT++M + + S + 
Sbjct: 10 AFIISVLSLIGFSFMAFTI-SANEYLKFDEDVIS-LVQGWESPLLTDIMKFFTYIGSTAS 67 

Query: 65 TSL I ALVI LGYQY - FLNQRI A VWLFM - LFFGTNALALLLKDI IARHRP - MNQLVFDSGYS 121 
50 +++LVIL + Y L R+ + LF + G+ L L++K R RP +++L+ GYS 

Sbjct: 68 LIILSLVILFFLYRILKHRLELVLFTAVMVGSPLLNLMVKLFFQRARPDLHRLIDIGGYS 127 

Query: 122 FPSGHTISAFLLMILvLWARQRLRRVLSQWFVIFALVILASVIFSRLYLENHFLTDIL 181 
FPSGH ++AF L ++ + + + ++++ ++F+++++ S+ SR+YL H+ +DI+ 
55 Sbjct: 128 FPSGHAMNAFSLYGILTFLLWRHITARWARILLILFSMLMILSIGISRIYLGVHYPSDII 187 

Query: 182 GSLLLG 187 
L G 

Sbjct: 188 AGYLAG 193 
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There is also homology to SEQ ID 1852. 

A related GBS gene <SEQ ID 861 1> and protein <SEQ ID 8612> were also identified. Analysis of this 
protein sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 11.91 
GvH: Signal Score (-7.5): -4.6 

Possible site: 20 
>>> Seems to have an uncleavable N-term signal seq 
10 ALOM program count: 5 value: -13.59 threshold: 0.0 

INTEGRAL Likelihood =-13.59 Transmembrane 152 - 168 ( 145 - 173) 
INTEGRAL Likelihood = -9.71 Transmembrane 7 - 23 ( 3 - 27) 
INTEGRAL Likelihood = -6.95 Transmembrane 125 - 141 ( 122 - 146) 
INTEGRAL Likelihood = -4.51 Transmembrane 85 - 101 ( 83 - 102) 
15 INTEGRAL Likelihood = -3.35 Transmembrane 55 - 71 ( 54 - 75) 

PERIPHERAL Likelihood = 1.16 184 
modified ALOM score: 3.22 

*** Reasoning Step: 3 

20 



25 



55 



Final Results 

bacterial membrane Certainty=0 . 6434 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



ORF01359(313 - 864 of 1212) 

EGAD | 16772 | 16959 (10 - 194 of 216) hypothetical protein {Bacillus megaterium} 
GP|28830l|emb|CAA79986.l| |Z21972 0RF2 {Bacillus megaterium} PIR| S32217 | S32217 hypothetical 
30 protein 2 - Bacillus megaterium 

%Match =9.5 

%Identity = 28.2 %Similarity =60.1 

Matches = 53 Mismatches = 68 Conservative Sub.s = 60 

35 66 96 126 156 186 216 246 276 

SFFIEFTHPFLIICNIHYSLRFKYIVAILLY**KFER*LIGKA/RIWYFF*FVNSHI*T* 

306 336 366 396 426 456 486 516 

SLLK*GYvWKKSFFQCVILLVSFLvLTIiAVKSQSDMISYLDNITSAFFQSIRNPDLTNLMTIISTWSPLTTSLIALVI 
40 :| |: |: | : : : | :: : : ::: :: | :| ||::| : : | : :::||| 

MKLKQQLTIAFIISVLSLIGFSFMAFTI-SANEYLKFDEDV-ISLVQGWESPLLTDIMKFFTYIGSTASLIILSLVI 
10 20 30 40 50 60 70 

543 570 600 630 657 687 714 744 

45 LGYQY-FLNQRIAWLFM-LFFGTNALALLLKDIIARHRP-MNQLVFDSGYSFPSGHTISAFLLM-ILVLWARQRLRRV 

I = I 1 =1= = II = |: I |:=l I II ===1= llllllll ==11 I II === h = 
LFFLYRILKHRLELvLFTAVMVGSPLLNLMVKLFFQRARPDLHRLIDIGGYSFPSGHAMNAFSLYGILTFLLWRH-ITAR 
90 100 110 120 130 140 150 

50 774 804 834 864 894 924 954 984 

LSQWFVIFALVILASVIFSRLYLENHFLTDILGSLLLGASSYYGLSAIVSLKELQ*K**LPMNYKRAFLKGSFIIHYFS 



WARILLILFSMLMILSIGISRIYLGVHYPSDIIAGYLA.GGCWIAISIWFFQRYQDRRKNKDR 
170 180 190 200 210 



Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 598 

A DNA sequence (GBSx0638) was identified in S.agalactiae <SEQ ID 1853> which encodes the amino 
60 acid sequence <SEQ ID 1854>. Analysis of this protein sequence reveals the following: 
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Possible site: 41 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4288 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15470 GB:Z99121 yvdC [Bacillus subtilis] 
Identities = 53/96 (55%) , Positives = 70/96 (72%) 

Query: 1 MDITDYQKWVSEFYKKRNWYQYNSFIRSNFLSEEVGELAQAIRKYEIGRDRPDETEQTDL 60 

M + D +KW+ EFY+KR W +Y FIR FL EE GELA+A+R YEIGRDRPDE E + 
Sbjct: 1 MQIiADAEKWMKEFYEKRGWTEYGPFIRVGFLMEEAGELARAvRAYEIGRDRPDEKESSRA 60 

Query: 61 ENLNDIKEELGDVLDNIFILADQYNISLEEIISAHR 96 

E ++ EE+GDV+ NI I LAD Y +SLE+++ AH+ 
Sbjct: 61 EQKQELIEEMGDVIGNIAILADMYGVSLEDVMKAHQ 96 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 599 

A DNA sequence (GBSx0639) was identified in S.agalactiae <SEQ ID 1855> which encodes the amino 
acid sequence <SEQ ID 1856>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0635 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06803 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 83/186 (44%) , Positives = 117/186 (62%) 

Query: 1 MRITIFCGASTGENPVYSEKTVAIAQWMAQNKHSLWGGGKVGLMGVMADTVIANGGYTT 60 

M+I +FCG+S G + VY E L + +A+ +LVYGG VG+MG +AD+V+ GG 
Sbjct: 1 MKIAVFCGSSNGASDWKEGARQLGKELARRGITLVYGGASVGIMGAVADSVLEAGGEVI 60 

Query: 61 GVI PTFLRDRE I AHENLSEL 1 1 VNNMPERKAKMMLLGDAFI ALPGGPGTLEE I SEVI SWS 120 

GV+P FL + EI+H +L++LI+V M ERKAKM L D F+ALPGGPGTLEE E+ +W+ 
Sbjct: 61 GVMPRFLEEPEISHPHLTKLIVVETMHERKAKMAELADGFLALPGGPGTLEEFFEIFTWA 120 

Query: 121 RIGQNDNPCILYNVNGYFNDLKNMFDHMVGEGFLSLEDRENVLFSDDITEIEDFITNYKV' 180 

+IG + PC L N+N YF+ L + HMEFL+R L D +D+Y+ 
Sbjct: 121 QIGLHQKPCGLLNINHYFDPLvTLLHHMSNEQFLHEKYRS^lALVHTDPILLLDQFSTYEP 180 

Query: 181 PSTRQY 186 

P+ + Y 
Sbjct: 181 PTVKAY 186 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 600 

A DNA sequence (GBSx0640) was identified in S.agalactiae <SEQ ID 1857> which encodes the amino 
acid sequence <SEQ ID 1858>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




-7 , 


.86 


Transmembrane 


222 - 


238 


( 214 - 


239) 


INTEGRAL 


Likelihood 




-6, 


.69 


Transmembrane 


39 - 


55 


( 36 - 


58) 


INTEGRAL 


Likelihood 




-4. 


.25 


Transmembrane 


256 - 


282 


( 266 - 


284) 


INTEGRAL 


Likelihood 




-1. 


.28 


Transmembrane 


166 - 


182 


( 166 - 


182) 


INTEGRAL 


Likelihood 




-1, 


,01 


Transmembrane 


190 - 


206 


( 190 - 


206) 


INTEGRAL 


Likelihood 




-0. 


.96 


Transmembrane 


70 - 


86 


( 70 - 


86) 



Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12420 GB:Z99107 ydiL [Bacillus subtilis] 
Identities = 40/132 (30%) , Positives = 63/132 (47%) , Gaps = 8/132 (6%) 

Query: 107 ESQNYDATFNI LMISYSVWGPFFEEVLYRGIVLNLL-SKYGKWFAIITSGILFG 160 

ES+N A ++ LMI S +VGP EE+++R 1+ L K +FA + S ++FG 

Sbjct: 114 ESENTQAILD VIQAVPLMI IVSS IVGPILEEI IFRKI I FGALYEKTNFFFAGLISS VI FG 173 

Query: 161 LMHQDISQLLTTSIAGIIMGFI-AYHYSFKVALLLHICNNFIVEIFTQLSTVNELYGTYF 219 

++H D+ LL + G F+ A V + H+ N V + QL V 

Sbjct: 174 IVHADLKHLLLYTAMGFTFAFLYARTKRIWVPIFAHIjMMNTFV-VIMQLEPvRNYLEQQS 232 

Query: 220 ENILLILAILFI 231 

+ LI+ LF+ 
Sbjct: 233 TQMQLIIGGLFL 244 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8613> and protein <SEQ ID 8614> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 

McG: Discrim Score: 12.52 

GvH: Signal Score (-7.5): -1.74 
Possible site: 19 

>>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 2 value: -6.69 threshold: 0.0 

INTEGRAL Likelihood = -6.69 Transmembrane 39 - 55 ( 36 - 58) 
INTEGRAL Likelihood = -0.96 Transmembrane 70- 86 ( 70- 86) 
PERIPHERAL Likelihood =4.56 21 
modified ALOM score: 1.84 



*** Reasoning Step: 3 



Final Results 

bacterial membrane -• 

bacterial outside -• 

bacterial cytoplasm -• 



- Certainty=0. 3675 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

Query: 10 LIGLILLAQAIVLSLATTLFAEILQNDVWIGIASTLIALLIPCF 53 

L+ L LL ++++LS++ +L +W+ +A+ L+A ++ CF 

Sbjct: 21 LLCLCLLWSLLLSVSLYSALILLVI.ILWVTVATPLLAFVVSCF 64 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 601 

A DNA sequence (GBSx0641) was identified in S.agalactiae <SEQ ID 1859> which encodes the amino 
5 acid sequence <SEQ ID 1860>. This protein is predicted to be capa protein. Analysis of this protein 
sequence reveals the following: 



10 



15 



Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.80 Transmembrane 27 - 43 ( 22 - 50) 



Final Results 

bacterial membrane Certainty=0. 6519 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9385> which encodes amino acid sequence <SEQ ID 9386> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF13661 GB:AF188935 pX02-56 [Bacillus anthracis] 
20 Identities = 68/224 (30%) , Positives = 118/224 (52%) , Gaps = 10/224 (4%) 

Query: 95 FKEVKSWIESADIAIGDYEGTISSE YPLAGYPL - FNAPNEIATTMKETGYDVVDLA 149 

F+ V +++++D G++E + E Y A + +A E +KE G+ V++LA 
Sbjct: 87 FRHVSPYLKNSDYVSGNFEHPVLLEDKKOTQKADKNIHLSAKEETVKAVKEAGFTVIjNIA 146 

25 

Query: 150 HNHILDSQIAGAINTVKTFNRLGLDTIGV^KDRNKEDILIKHVNGIKIAILGYSYGY-N 208 

+NH+ D G +T+K F LD +G ++ ++I+ ++VNG+++A LG++ + 

Sbjct: 147 NNHMTDYGAKGTKTJTIKAFKIMLDYVGAGENFKDVKNIVYQNVNGVRVATLGFTDAFVA 206 

30 Query: 209 GMFJUWSKSDYEKHMSDLDTKKIKQDIKKAEKEADITIVMPQMGIEYQKKPTTEQVMLYH 268 

G A + D+ K+I + + AD+ +V G EY KP+ Q L 

Sbjct: 207 GAIATKEQPGSLSMNPDVLLKQISKAKDPK^GNM3LVVVNTHWGEEYDNKPSPRQEALAK 266 

Query: 269 SMIKWGADIIFGGHPHWEPSEVIKKDGQKKFIIYSMGNFISNQ 312 
35 +M+ GADII G HPHV++ +V K+ I YS+GNF+ +Q 

Sbjct: 267 AMVDAGADIIVGHHPHVLQSFDVYKQG 1 IFYSLGNFVFDQ 306 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1861> which encodes the amino acid 
sequence <SEQ ID 1862>. Analysis of this protein sequence reveals the following: 

40 Possible site: 45 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.05 Transmembrane 44 - 60 ( 40 - 68) 

Final Results 

45 bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9119> which encodes the amino acid sequence 
50 <SEQ ID 9120>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 31 
>>> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial membrane Certainty= 0.582 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 
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bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 232/334 (69%) , Positives = 273/334 (81%) , Gaps = 4/334 (1%) 

5 

Query: 24 YQKTLIFCTAVIIAIFILGLSKEIAQSKGQKVANNNT VKTARWANGDILLHDVLY 79 

Y+KT+ VA+I+A+ + GL DL + ++A + VKTARWANGDIL+HD+LY 
Sbjct: 40 YKKT^IATWALIVALLLFGLIYDLLGVQK^ffiLAAQKSAQPKVKTARWANGDILIHDILY 99 

10 Query: 80 ASARQPDGTYNFTPYFKEVKSWIESADLAIGDYEGTISSEYPLAGYPLFNAPNEIATTMK 139 

SAR+ D TY+FTPYF+ VK WI ADLAIGDYEGTIS +YPLAGYPLFNAP EIA +K 
Sbjct: 100 MSARKADDTYDFTPYFEYVKDWISGADLAIGDYEGTISPDYPLAGYPLFNAPEEIAGALK 159 

Query: 140 ETGYDVVI5LAHNHILDSQLAGAINTvT<TFNRLGLDTIGOTLK^ 199 
15 TGYDWDLAHNHILDSQL GA+NT K F++LG+D+IG+Y KDR+KE LIK+VNGIKIA 

Sbjct: 160 NTGYDVVDLAHNHILDSQLDGAIOTKOTFHQLGm^ 219 

Query: 200 ILGYSYGYNGMEANVSKSDYEKHMSDLDTKKIKQDIKKAEKEADITI VMPQMGIEYQKKP 259 
ILGYSYGYNGMEA +S+ DYEKHMSDLD KIK++++ AEK+AD+T1VMPQMG EY +P 
20 Sbjct: 220 ILGYSYGYNGMEATLSQEDYEKHMSDLDEAKIKKELQIiAEKKADVTIVMPQMGTEYALEP 279 

Query: 260 TTEQVMLYHSMIKWGADIIFGGHPHVTOPSEVIKKBGQKKFIIYSMGNFISNQRLETVDD 319 

T EQ LYH MI WGAD++ GGHPHV+EPSE + K QKKFIIYSMGNFISNQRLETVDD 
Sbjct: 280 TAEQKELYHKMIDWGADWLGGHPHVIEPSETVIKGRQKKFIIYSMGNFISNQRLETVDD 339 

25 

Query: 320 IWTERGLLMDVTIEKKGQKTVIKKVKAHPTLVEA 353 

IWTERGLLMD+T EKK KT IK V+AHPT+V A 
Sbjct: 340 IWTERGLLMDIiTFEKKDNKTKIKTVEAHPTMVLA 373 

30 A related GBS gene <SEQ ID 8615> and protein <SEQ ID 8616> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 7 
SRCFLG: 0 

McG: Length of UR: 18 
35 Peak Value of UR: 3.83 

Net Charge of CR: 2 
McG: Discrim Score: 15.36 
GvH: Signal Score (-7.5): -1.52 
Possible site: 32 
40 >>> Seems to have a cleavable N-term signal seq. 

Amino Acid Composition: calculated from 33 
ALOM program count: 0 value: 4.35 threshold: 0.0 
PERIPHERAL Likelihood = 4.35 170 
modified ALOM score: -1.37 

45 

*** Reasoning Step: 3 

Rule gpol 

50 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

30.6/53.3% over 230aa 

Bacillus anthracis 

EGAD|2015l| capa protein Insert characterized 
SP|P19579|CAPA_BACAN CAPA PROTEIN. Edit characterized 
60 GP] 142633 ]gb|AAA22288.l| |M24150 46 Kd encapsulation protein CapA Insert characterized 

PIR|C3009l|C30091 capA protein - Insert characterized 
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EGAD | 20151 | 20674 (83 - 313 of 411) capa protein {Bacillus anthracis} SP | P19579 | CAPA_BACAN 
CAPA PROTEIN. GP | 142633 | gb |AAA22288 . 1 | |M24150 46 Kd encapsulation protein CapA {Bacillus 
anthracis} PIR| C30091 | C30091 capA protein - Bacillus anthracis 
%Match = 8.9 
5 %Identity =30.6 %Similarity =53.3 

Matches = 70 Mismatches = 102 Conservative Sub.s = 52 

468 498 528 558 585 615 645 663 
LAQSKGQK^ANI^WKTARWANGDILLHDVLYASARQPDGTYNFTPY-FKEVKSWIESADLAIGDYEGTI SSEYP 



IAATWQRTEAVAPVKHRENEKLTMTMVGDIMMGRHVKEIVmYGTDYVFRHVSPYLKNSDYVSGNFEHPVLLEDKKNYQ 
50 60 70 80 90 100 110 

690 720 750 780 810 840 870 900 

15 IAGYPL-FNAPiraiATTMKETGYDVVDLAHN^^ 

I = ::| I =11 1= l-IMb I I =1 = 1 I II =1 == = = l= = = ll|: = :| 

KADKNIHLSAKEETOKAVKFAGFTVimANNHMTO^ 

130 140 150 160 170 180 190 

20 927 957 987 1017 1047 1077 1107 1137 

LGYSYGY-NGMEAWSKSDYEKHMSDLDTKKIKQDIKKAEKEADITIVMPQMGIEYQKKPTTEQVMLYHSMIKWGADIIF 

||:: :. | | = 1= 1 = 1 = = I I = = I = I I I I I = I I = I = I I I I I 

LGFTDAFVAGAIATKEQPGSLSMNPDVLLKQISKAKDPKKGNA^ 

210 220 230 240 250 260 270 

25 

1167 1197 1227 1257 1287 1317 1347 1377 

GGHPHWEPSEVIKKDGQKKFIIYSMGNFISNQRLETVDDIWITSRGL^ 

I 1 I I I : : = I I = I I h I I I = = I 1=11 = = =1 I 

GHHPHVLQSFDVYK QGIIFYSLGNFVFDQGWTRTKDSALVQYHLRDNGTAILDWPLNIQEGSPKPVASALDKNRV 

30 290 300 310 320 330 340 350 

SEQ ID 8616 (GBS289) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 5; MW 40kDa), in Figure 181 (lane 6; MW 47kDa), in Figure 169 (lane 
13 & 14; MW 54.5kDa - thioredoxin fusion) and in Figure 239 (lane 3; MW 54.5kDa). It was also 
35 expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 61 
(lane 5; MW 65kDa). 

SEQ ID 8616 (GBS289L) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 126 (lane 2; MW 72kDa) and in Figure 184 (lane 5; MW 72kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 126 
40 (lane 5-7; MW 47kDa). 

GBS289L-His was purified as shown in Figure 234, lane 9-10. Purified GBS289L-GST is shown in Figure 
245, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 602 

A DNA sequence (GBSx0642) was identified in S.agalactiae <SEQ ID 1863> which encodes the amino 
acid sequence <SEQ ID 1864>. This protein is predicted to be thiamin biosynthesis protein Thil (thil). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2720 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9971> which encodes amino acid sequence <SEQ ID 9972> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC00308 GB:AF008220 YtbJ [Bacillus subtilis] 
Identities = 184/354 (51%) , Positives = 249/354 (69%) 



Query: 


11 


MQYSEIMIRYGELSTKKKNRMRFINKXiKNISMEHVLSiyPDVSVKTDRDRGHvYLNGTDYH 


70 






M Y I+IR+GE+STK KNR FI +LK N+ VL YP++ ++RDR + IiNG D 




Sb j ct : 


1 


MNYDHILIRFGEISTKGKNRKSFIERLKQNIRLVLKDYPNLKYFSNRDRMTITLNGEDPE 


60 


Query : 


71 


EVAESLKEIFGIQAFSPSFKVEKNVDTLVKAVQEIMTSVYKDGMTFKITAKRSDHSFELD 


130 






+ LK++FGIQ+FS + K++D+ ++ YKG TFK+ KR+ FELD 




Sbjct: 


61 


ALFPHLKQVFGIQSFSLA.IKCDSRLDDIKATALKAIKDQYKPGDTFKVATKRAYKQFELD 


120 


Query: 


131 


SRALNHTLGDAVFSVLPNIKAQMKQPDINLKVEIRDEAAYISYEDIRGAGGLPVGTSGKG 


190 






+ +N +G + + ++ PDI L++EIR+EA +++ D +GAGGLPVG++GK 




Sb j ct : 


121 


TNQMNAEIGGHILRNTEGLTVDVRNPDIPLRIEIREEATFLTIRDEKGAGGLPVGSAGKA 


180 


Query: 


191 


MLMLSGGIDSPVAGYLALKRGVDIEAVHFASPPYTSPGALKKAHDLTRKLTKFGGNIQFI 


250 






MLMLSGG DSPVAG+ A+KRG+ +EAVHF SPPYTS A +K DL + L++FGG++ 




Sb j ct : 


181 


MLMLSGGFDSPVAGFYAMKRGLSVEAVHFFSPPYTSERAKQKVMDLAKCLSRFGGSMTLH 


240 


Query: 


251 


EVPFTEIQEEIKAKAPEAYLMTLTRRFMMRITDRIREDRNGLVIINGESLGQVASQTLES 


310 






VPFT+ QE 1+ + PE Y MT TRR M++I DRIRE RNGL II GESLGQVASQTLES 




Sb j ct : 


241 


IVPFTKTQELIQKQIPENYTMTATRRLMLQIADRIREKRNGLAIITGESLGQVASQTLES 


300 


Query: 


311 


MQAINAVTATPIIRPWTMDKLEIIDIAQKIDTFDISIQPFEDCCTIFAPDRPK 364 








M AINAVT+TPI+RP++ MDK EII+ +++I T++ SIQPFEDCCTIF +P+ 




Sb j ct : 


301 


MYAINAVTSTPILRPLIAMDRTEIIEKSREIGTYETSIQPFEDCCTIFTTAKPR 354 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1865> which encodes the amino acid 
sequence <SEQ ID 1866>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4897 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 316/404 (78%) , Positives = 362/404 (89%) 

Query: 11 MQYSEIMIRYGELSTKKKNRMRFINKLKNNMEHVLSIYPDVSVKTDRDRGHVYLNGTDYH 70 

M YSEIM+R+GELSTK KNRMRFINKLKNN++ VL+ +P ++V++DRDR HV LNGTDY 
Sbjct: 1 MDYSEIMWHGELSTKGKflRMRFINKLKISINIQDvIAPFPAITVRSDRDRTHVSLNGTDYQ 60 

Query: 71 EVAESLKEIFGIQAFSPSFKVEKNVOTLVKAVQEIMTSVYKDGMTFKITAKRSDHSFELD 130 

+ E+LK +FG+QA SP +K+EK+V LV AVQ+IMTS+Y+DG+TFKI KRSDH+FELD 
Sbjct: 61 PIVEALKLVFGVQALSPVYKLEKSVPLLvTAVQDIMTSLYRDGLTFKIATKRSDHAFELD 120 

Query: 131 SRALNHTLGDAVFSVLPNIKAQMKQPDINLKVEIRDEAAYISYEDIRGAGGLPVGTSGKG 190 

SR LN LG AVF VLPNI+AQMK PD+ LKVEIRDEAAYISYE+ 1 +GAGGLPVGTSGKG 
Sbjct: 121 SRELNSLLGGAVFEVLPNIQAQMKHPDVTLKVEIRDEAAYISYEEIKGAGGLPVGTSGKG 180 

Query: 191 MLMLSGGIDSPVAGYLALKRGVDIEAVHFASPPYTSPGALKKAHDLTRKLTKFGGNIQFI 250 

MLMLSGGIDSPVAGYLALKRG+DIE VHFASPPYTSPGAL KA DLTR+LT+FGGNIQFI 
Sbjct: 181 MLMLSGGIDSPVAGYLALKRGLDIEWHFASPPYTSPGALAKAQDLTRRLTRFGGNIQFI 240 



Query: 251 EVPFTEIQEEIKAKAPEAYLMTLTRRFMMRITDRIREDRNGLVIINGESLGQVASQTLES 310 
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EVPFTEIQEEIK KAPEAYLMTLTRRFMMRITD IRE R GLVI+NGESLGQVASQTLES 
Sbjct: 241 EVPFTEIQEEIKNKAPEAYLMTLTRRFMMRITDAIREQRKGLVIVNGESLGQVASQTLES 300 

Query: 311 MQAINAVTATPIIRPWTMDKLEIIDIAQKIDTFDISIQPFEDCCTIFAPDRPKTNPKIK 370 

MQAINAVT+TPIIRPWTMDKLEII++AQ IDTFDISIQPFEDCCTIFAPDRPKTNPK+ 
Sbjct: 301 MQAINAVTSTPIIRPWTMDKLEIIEMAQAIDTFDISIQPFEDCCTIFAPDRPKTNPKLG 360 

Query: 371 NTEQYEKRMDVEGLVERAVAGIMVTTIQPQADSDDVDDLIDDLL 414 

N E+YE+ D++GLV+RAV+GI +VT I P+ +D+V++LID LL 
Sbjct: 361 NAEKYEECFDIDGLVQRAVSGIWTEITPEIVNDEVENLIDALL 404 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 603 

A DNA sequence (GBSx0643) was identified in S.agalactiae <SEQ ID 1867> which encodes the amino 
acid sequence <SEQ ID 1868>. This protein is predicted to be nifs protein homolog , fragment. Analysis of 
this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 131 - 147 ( 131 - 147) 

Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA43493 GB:X61190 nifS-like gene [Lactobacillus delbrueckii] 
Identities = 177/353 (50%) , Positives = 234/353 (66%) , Gaps = 1/353 (0%) 



Query: 


14 


PEVLRTYQEVASKIYGNPSSLHELGTTSSRILEASRKQIASLLELKANEIFFTSGGTEAD 


73 






P+ L TY +V +KI+GNPSSLH+LG + +LEASRKQ+A LL + +EI+FTSGGTE++ 




Sbjct: 


3 


PKALETYSQWTKIWGNPSSLHKLGDRAHGLLEASRKQVADLLGVNTDEIYFTSGGTESN 


62 


Query: 


74 


NWVIKGLAFEKQHFGNHIIVSDIEHPAVKESAKWLGEYGFEIDYAPVDDKGFVDVEALVK 


133 






N I KG A+ K+ FG HII S +EH +V + L GF + PVD +G V+ E L 




Sb j ct : 


63 


NTAIKGTAWAKREFGKHIITSSVEHASVANTFTELENLGFRVTRLPVDKEGRVNPEDLKA 


122 


Query: 


134 


LIKPETILISIMAINNEIGSIQPIKAISDLLSDKPTISFHVDAVQAIGKIPTKDYLTERV 


193 






+ +T L+SIM +NNEIG+IQPIK IS++L+D P I FHVD VQA+GK T RV 




Sbjct: 


123 


ALDKDTTLVSIMGVNNEIGTIQPIKEISEILADYPNIHFHVDNVQALGKGIWDQVFTSRV 


182 


Query: 


194 


DFASFSSHKFHGVRGVGFLYIKEGKRISPLLTGGGQETDLRSTTENVAG1AATAKALRMV 


253 






D SFSSHKFHG RG+G LY K G+ + PL GGGQE LRS TEN+A IAA AKA R++ 




Sbjct: 


183 


DMMSFSSHKFHGPRGIGILYKKRGRMLMPLCEGGGQEKGLRSGTENLAAIAAMAKAARLL 


242 


Query: 


254 


MDKEWAIPKISKMKTIIHDELAKYEDITLFSG-KEDFSPNIITFGIKGVRGEVLVHAFE 


312 






+ E + +K I LA I +FS K DF+P+I+ F ++G+RGE LVH E 




Sb j ct : 


243 


LTDEKEKADREYAIKEKISKYLAGKPGIHIFSPLKADFAPHILCFALEGIRGETLVHTLE 


302 


Query: 


313 


GHDIFISTTSACSSKAGKPAGTLIAMGISTKLAQTAVRISLDDDNDMGQVEQF 365 








DI+ISTTSAC+SK A TL+AM +A +AVR+S D+ N + + ++F 




Sb j ct : 


303 


DQDIYISTTSACASKKADEASTLVAMKTPDAIATSAVRLSFDESNTLEEADEF 355 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1869> which encodes the amino acid 
sequence <SEQ ID 1870>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 



WO 02/34771 



PCT/GB01/04789 



-686- 

Final Results 

bacterial cytoplasm Certainty=0. 3067 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 268/370 (72%) , Positives = 322/370 (86%) 

Query: 1 MIYFDNSATTIPYPEVLRTYQEVASKIYGNPSSLHELGTTSSRILEASRKQIASLLELKA 60 
10 MIYFDN+ATTIPY E L+TYQEVA+KIYGNPSSLH+LGT +SRILEASRKQIA LL +K+ 

Sbjct: 1 MIYFDNAATTIPYGEALKTYQEVATKIYGNPSSLHQLGTNASRILEASRKQIAGLLGVKS 60 

Query: 61 NEIFFTSGGTEADNWVIKGLAFEKQHFGNHIIVSDIEHPAVKESAKWLGEYGFEIDYAPV 120 
EIFFTSGGTE+ NW IKG+AFEK FG HII+S IEHPAV ES KWL GFE+ YAPV 
15 Sbjct: 61 EEIFFTSGGTESANWAIKGIAFEKNAFGKHIIISAIEHPAVSESVKWLLTQGFEVSYAPV 120 

Query: 121 DDKGFVDVEALVKLIKPETILISIMAINNEIGSIQPIKAISDLLSDKPTISFHVDAVQAI 180 

+G VDV AL +LI+P+TILISIMA+NNE+G+IQPI+AIS+LL+++PTI+FHVDAVQAI 
Sbjct: 121 TTQGVVDWALAELIRPDTILISIMAVNNEMGAIQPIRAISNLLANQPTITFHVDAVQAI 180 

20 

Query: 181 GKIPTKDYLTERVDFASFSSHKFHGVRGVGFLYIKEGKRISPLLTGGGQETDLRSTTENV 240 

GKIP DY+T RVD ASFS HKFH VRGVGFLY K GKR++PLL+GGGQE +LRSTTENV 
Sbjct: 181 GKIPLCDYMTNRVDLASFSGHKFHSVRGVGFLYKKAGKRIiNPLLSGGGQEQELRSTTENV 240 

25 Query: 241 AGIAATAKALRMVMDKEWAI PKISKMKTI IHDELAKYEDITLFSGKEDFSPNI ITFGI K 300 

AGIA+ AKALR+V +K+V +PK++ M+ +1+ L+ Y D+T+FS +E F+PNI+TFGI+ 
Sbjct: 241 AGIASMAKALRIVTEKQVSVLPKLTAMRDVIYKSLSAYPDVTVFSAQEGFAPNILTFGIR 300 

Query: 301 GVRGEVLVHAFEGHDIFISTTSACSSKAGKPAGTLIAMGISTKLAQTAVRISLDDDNDMG 360 
30 GVRGEV+VHAFE ++I+ISTTSACSSKAG+PAG+L+AMGI K AQTAVRISLDDDNDMG 

Sbjct: 301 GTOGEVIVHAFEKYEIYISTTSACSSKAGEPAGSLVRMGIPvKTAQTAVRISLDDDNDMG 360 

Query: 361 QVEQFLTIFK 370 
QVEQFLTIF+ 
35 Sbjct: 361 QVEQFLTIFQ 370 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 604 

40 A DNA sequence (GBSx0644) was identified in S.agalactiae <SEQ ID 1871> which encodes the amino 
acid sequence <SEQ ID 1 872>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0.1539,(Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on tins analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-687- 

Example605 

A DNA sequence (GBSx0645) was identified in S.agalactiae <SEQ ID P^f/Ktfch enfcodeMhe stmffio 
acid sequence <SEQ ID 1874>. This protein is predicted to be glutathione reductase (gor). Analysis of this 
protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -4.25 Transmembrane 170 - 186 ( 169 - 187) 

Final Results 

bacterial membrane Certainty=0. 2699 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA76640 GB:AB019579 glutathione reductase (GR) [Streptococcus mutans] 
Identities = 274/450 (60%), Positives = 346/450 (76%), Gaps = 1/450 (0%) 



Query: 


1 


MSKQYDYIVIGGGSAGSGTANRAAMYGAKVLLIEGGQVGGTCVNLGCVPKKIMWYGAQVS 


60 






M+KQYDYIVIGGGS G +ANRAAM+GAKV+L EG QVGGTCVN+GCVPKK+MWYGAQV+ 




Sb j ct : 


1 


MTKQYDYIVIGGGSGGIASANRAftMHGAKVILPEGKQVGGTCVNVGCTPKKVMWYGAQvA 


60 


Query: 


61 


ETLHKYSSGYGFEVNNLNFDFTTLKANRDAYVQRSRQSYAANFERNGvEKIDGFARFIDN 


120 






ET++ Y++ YGF+V F F LK NR AY+ R + SY F+ NGVE++ +A F+D 




Sb j ct : 


61 


ETINNYAADYGFDVTTQTFHFDALKQNRQAYIDRIQDSYERGFDSNGVERVYSYATFVDA 


120 


Query: 


121 


HTIEVNGQQYKAPHITIATGGHPLYPDIIGSELGETSDDFFGWETLPDSILIVGAGYIAA 


180 






HT+EV G+ Y APHI IATGGH L PDI GSE G TSD FF + +P +VGAGYIA 




Sb j ct : 


121 


HTVEVAGEHYTAPHILIATGGHALLPDIPGSEYGITSDGFFELDAIPKRTAWGAGYIAV 


180 


Query: 


181 


EIAGVVNELGVETHIjAFRKDHILRGFDDMOTSEvMAEMEKSGISLHANHVPKSLKRDEGG 


240 






E++GV++ LG ETHL R+D LR FD + ++ EM+K G LH VPK + ++ 




Sb j ct : 


181 


E I SGvLHALGGETHLFVRRDRPLRKFDKEIVGTLVDEMKKDGPHLHTFS VPKEVI KNTDN 


240 


Query: 


241 


KLIFFJ^NGKTLVVDRVIWAIGRGPNV-DMGLEOT^DIVIjNDKGYIKADEFENTSVDGVYA 


299 






L ENG+ VD +IWAIGR N LE T + L+ +G+I D FENT+V+G+YA 




Sb j ct : 


241 


SLTLILENGEEYTVDTLIWAIGRAANTKGFNLEVTGOTLDSRGFIATDAFENTNvEGLYA 


300 


Query: 


300 


IGDvNGKIALTPVAIAAGRRLSERLFNHKDNEKLDYHNVPSVIFTHPVIGTVGLSEAAAI 


359 






+GDVNGK+ LTPVA+ AGR+LSERLFNHK K+DY +V +VIF+HPVIG++GLSE A+ 




Sb j ct : 


301 


LGDWGKLELTPVAVKAGRQLSERLFNHKPQAKMDYKDVATVI FSHPVIGS IGLSEEVAL 


360 


Query: 


360 


EQFGEDNIKVYTSTFTSMYTAVTTNRQAVKMKLITLGKEEKVIGLHGVGYGIDEMIQGFS 


419 






+Q+GE+N+ VY STFTSMYTAVT++RQA KMKL+T+G++EK++GLHG+GYG+DEMIQGF+ 




Sb j ct : 


361 


DQYGEENVTVYRSTFTS^TAVTSHRQACKMKljVTVGEDEKIVGLHGIGYGVDEMIQGFA 


420 


Query: 


420 


VAIKMGATKADFDDTVAIHPTGSEEFVTMR 449 








VAIKMGATKADFD+TVAIHPTGSEEFVTMR 




Sbjct: 


421 


VAIKMGATKADFDNTVAIHPTGSEEFVTMR 450 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1875> which encodes the amino acid 
sequence <SEQ ID 1876>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.33 Transmembrane 173 - 189 ( 173 - 191) 

Final Results 

bacterial membrane Certainty^O. 1532 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 268/446 (60%), Positives = 340/446 (76%), Gaps = 1/446 (0%) 

Query: 5 YDYIVIGGGSAGSGTANRAA^GAKOTiLIEGGQVGGTCTOLGCVPKKIMWYGAQVSETLH 64 

YDYIVIGGGSAG +ANRAAM+GAKVLL EG ++GGTCVNLGCVPKK+MWYGAQV++ L 
Sbjct: 8 YDYIVIGGGSAGIASANRAAMHGAro/LLAEGKEIGGTCVNLGCVPKKOTlWYGAQVADILG 67 

Query: 65 KY-SSGYGFEvim^FDFTTLKANRDAYVQRSRQSYAANFERNGVEKIDGFARFIDNHTIE 124 

Y+ YGF+ FDF LKANR AY+ R SY FE+NGV++I +A F D HT+E 

Sbjct: 68 TYAKDYGFDFKEKAFDFKQLKANRQAYIDRIHASYERGFEQNGVDRIYDYAVFKDAHTVE 127 

Query: 125 VNGQQYKAPHITIATGGHPLYPDIIGSELGETSDDFFGWETLPDSIIilVGAGYIAAELAG 184 

+ GQ Y APHI IATGGHP++PDI G++ G +SD FF + +P +VGAGYIA ELAG 
Sbjct: 128 IAGQLYTAPHILIATGGHPVFPDIEGAQYGISSDGFFALDEVPKRTAWGAGYIAVELAG 187 

15 Query: 185 WlffiLGVETHIAFRKDHILRGFDDMVTSEVMAEMEKSGISLHANHVPKSLKRDEGGKLIF 244 

V++ LG +T h R D LR FD + ++ EM +G LH + + ++ L 

Sbjct: 188 VLHALGSKTDLFIRHDRPLRSFDKTIVDVLVDEMAVNGPRLHTHAEVAKWKNTDESLTL 247 

Query: 245 EAENGKTLVVTJRVIWAIGRGPlSrTO-MGLEOTDIVLNDKGYIKADEFENTSVDGVYAIGDV 303 
20 ++G+ + VD++IWAIGR PN++ L+ T + LNDKGYI+ D +ENTSV G+YA+GDV 

Sbjct: 248 YLKDGQEVEVDQLIWAIGRKPNLEGFSLDKTGVTIiNDKGYIETDAYENTSVKGIYAVGDV 307 

Query: 304 NGKIALTPVAIAAGRRLSERLFNHKDNEKLDYHNVPSVIFTHPVIGTVGLSEAAAIEQFG 363 
NGK+ALTPVA+AAGRRLSERLFN K +EKLDY NV +VI F+HPVIG+VGLSE AA++Q+G 
25 Sbjct: 308 NGKLALTPVAVAAGRRLSERLFNGKTDEKLDYQNVATVI FSHPVIGSVGLSEEAAVKQYG 367 

Query: 364 EDNIKVYTSTFTSMYTAVTTNRQAVKMKLITLGKEEKVIGLHGVGYGIDEMIQGFSVAIK 423 

++ +K Y S FTSM+TA+T +RQ MKL+T+G EK++GLHG+GYG+DEMIQGF+VAIK 
Sbjct: 368 QEAVKTYQSRFTSMFTAITNHRQPCLMKLVTVGDTEKIVGLHGIGYGVDEMIQGFAVAIK 427 

30 

Query: 424 MGATKADFDDTVAIHPTGSEEFVTMR 449 

MGATKADFD+TVAIHPTGSEEFVTMR 
Sbjct: 428 MGATKADFDNTVAIHPTGSEEFVTMR 453 

35 SEQ ID 1874 (GBS417) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 5; MW 53kDa). 

GBS417-His was purified as shown in Figure 216, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 606 

A DNA sequence (GBSx0646) was identified in S.agalactiae <SEQ ID 1877> which encodes the amino 
acid sequence <SEQ ID 1878>. Analysis of this protein sequence reveals the following: 



45 



50 



Possible site: 35 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3122 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC62417 GB:AF084104 hypothetical protein [Bacillus firmus] 
Identities = 33/110 (30%) , Positives = 66/110 (60%) 

55 Query: 1 MAIWYDLANELERAVRALPEYQAVLTAKSAIESDADAQvTiWQDFLATQSKVQEMMQSGQM 60 

M+NVYD A+EL++A+ E+ A+ + IE+D A+ + ++F Q ++Q+ G 
Sbjct: 1 MSNVYDKAHELKKA1AESEEFSALKSMHEEIEMEIAKKMLENFRNLQLELQQKQMQGIQ 60 



Query: 61 PSQEEQDEMSKLGEKIESNDLLKVYFDQQQRLSVYMSDIEKIVFAPMQDL 110 
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++EE + + E ++ ++L+ + +QRLSV + DI KI+ P++++ 
Sbjct: 61 ITEEEAQKAQQQFELVQQHELISKLMEAEQRLSVIIGDINKIITEPLEEI 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1879> which encodes the amino acid 
5 sequence <SEQ ID 1880>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 4058 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 68/108 (62%) , Positives = 86/108 (78%) 

Query: 4 VYDIANELERAVRALPEYQAVLTAKSAIESDADAQVLWQDFLATQSKVQEMMQSGQMPSQ 63 

+YD AN+LERAVRALPEYQ VL K AI++D A L+ +F+A Q K+Q MMQSGQMP+ 
Sbjct: 5 IYDYANQLERAVRALPEYQKVLEVKEAIQADVSASELFDEFVAMQEKIQGMMQSGQMPTA 64 

Query: 64 EEQDEMSKLGEKIESNDLLKVYFDQQQRLSVYMSDIEKIVFAPMQDLM 111 

EEQ + +L +KIE+ND LK YF+ QQ LSVYMSDIE+IVFAP++DL+ 
Sbjct: 65 EEQTSIQELSQKIEANDQLKAYFEAQQALSVYMSDIERIVFAPLKDLV 112 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 607 

A DNA sequence (GBSx0647) was identified in S.agalactiae <SEQ ID 1881> which encodes the amino 
acid sequence <SEQ ID 1882>. This protein is predicted to be chorismate synthase (aroC). Analysis of this 
30 protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.67 Transmembrane 343 - 359 ( 341 - 364) 

35 Final Results 

bacterial membrane Certainty=0 . 2869 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05375 GB:AP001512 chorismate synthase [Bacillus halodurans] 
Identities = 227/381 (59%) , Positives = 282/381 (73%) , Gaps = 2/381 (0%) 

Query: 1 MRYLTAGESHGPSLTAIIEGIPAGLKLSAKDINEDLKRRQGGYGRGNRMKIETDQVIISS 60 
45 MRYLTAGESHGP LT IIEG PA L+L A DIN DL RRQGG+GRG RM+IE DQV I 

Sbjct: 1 MRYLTAGESHGPQLTTIIEGAPAQLELVADDINVDLARRQGGHGRGRRMQIEKDQVQIVG 60 

Query: 61 GVRHGKTLGSPITLTVTNKDHSKWLDIMSVEDI--EERLKQKRRIICHPRPGHADLVGGIK 118 
G+RHGKT G+PI L V NKD W IM E + +E + KR+I PRPGHADL G IK 
50 Sbjct: 61 GIRHGKTTGAPIALVVENKDWKHWTKIMGAEPLTGDEEKEIKRKITRPRPGHADLNGAIK 120 

Query: 119 YRFDDLRNALERSSARETTMR VAIGAI AKR I LKE I G I E IANHI WFGGKE ITVPDKLTVQ 178 

Y D+RN LERSSARETT+RVA GA+AK+IL+ GIE+ +H++ GG + + 
Sbjct: 121 YGHRDMRNVLERSSARETTTOVAAGAvAKKILRTFGIEVGSHVLEIGGVKAEKTSYDQLS 180 



55 



Query: 179 QIKVLSSQSQVAITOPSFEQEIKDYIDSVKKAGDTIGGVVETIVGGVPVGLGSYVHWDRK 238 

+K L+ S V ++ EQE+ ID K+ GD+IGGWE IV GVP+GLGS+VH+DRK 
Sbjct: 181 NLKEIiAEASPVRCLDKEAEQEMIAAIDQAKENGDSIGGVVEVIVEGVPIGLGSHVHYDRK 240 
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Query: 239 LDAKIAQAWSINAFKGVEFGLGFKSGFLKGSQVmSISWTKDQGYIRQSNNLGGFEGGM 298 

LDAKIA AV+SINAFKGVEFG+GF++ GS+V D I+W +++GY R+SNNLGGFEGGM 
Sbjct: 241 LDAKIAAA.VMSINAFKGVEFGIGFFAASKPGSEVHDEIAWDEERGYYRKSNNLGGFEGGM 300 

5 

Query: 299 TNGEPIITOGVMKPIPTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGVVMEAWATVL 358 

TNG PI+VRGVMKPIPTLYKPL SVDI T EP+ A++ERSD A+PAA W EAWA + 
Sbjct: 301 TNGMPIWRGVMKPIPTLYKPLQSVDIATKEPFAASIERSDSCAVPAAAWAEAWAWEV 360 

10 Query: 359 OTEVLEKFSSDNMYELKEAVK 379 

+LE+F +D + E+++ ++ 
Sbjct: 361 ANALLERFGADQVEEIEKNIR 381 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1883> which encodes the amino acid 
15 sequence <SEQ ID 1884>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 342 - 358 ( 342 - 359) 
INTEGRAL Likelihood = -0.16 Transmembrane 155 - 171 ( 155 - 171) 

20 



25 



60 



Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05375 GB:AP001512 chorismate synthase [Bacillus halodurans] 
Identities = 213/390 (54%) , Positives = 277/390 (70%) , Gaps = 2/390 (0%) 

30 Query: 1 LRYLTAGESHGPSLTAIIEGIPAGLTLHPADIDHELQRRQGGYGRGAKMSIETDRVQISS 60 

+RYLTAGESHGP LT IIEG PA L L DI+ +L RRQGG+GRG RM IE D+VQI 
Sbjct: 1 MRYLTAGESHGPQLTTIIEGAPAQLELVRDDINVDIiARRQGGHGRGRRMQIEKDQVQIVG 60 

Query: 61 GVRHGKTTGAPITLTVINKDHQKWLDVMAVGDI - -EETLKLKRRVKHPRPGHADLVGGIK 118 
35 G+RHGKTTGAPI L V NKD + W +M + +E ++KR++ PRPGHADL G IK 

Sbjct: 61 GIRHGKTTGAPIALWENKDWKHWTKIMGAEPLTGDEEKEIKRKITRPRPGHADLNGAIK 120 

Query: 119 YHFNDLRDALERSSARETTMRVAVGAVAKRILAELGIDMLHHILIFGGITITIPSKLSFR 178 
Y D+R+ LERSSARETT+RVA GAVAK+IL GI + + H+L GG+ S 
40 Sbjct: 121 YGHRDMRNVLERSSARETTVRVAAGAVAKKII^TFGIEVGSHVLEIGGVKAEKTSYDQLS 180 

Query: 179 ELQERALHSELSIVNPKQEEEIKTYIDKIKKEGDTIGGIIETIVQGVPAGLGSYVQWDKK 238 

L+E A S + ++ + E+E+ ID+ K+ GD+IGG++E IV+GVP GLGS+V +D+K 
Sbjct: 181 NLKELAEASPVRCLDKEAEQEMIAAIDQAKENGDSIGGWEVIVEGVPIGLGSHVHYDRK 240 

45 

Query: 239 LDAKLAQAVLSINAFKGVEFGAGFDMGFQKGSQVMDEITWTPTQGYGRQTNHLGGFEGGM 298 

LDAK+A AV+SINAFKGVEFG GF+ + GS+V DEI W +GY R++N+LGGFEGGM 
Sbjct: 241 LDAKIAAAVMSINAFKGVEFGIGFEAASKPGSEVHDEIAWDEERGYYRKSNNLGGFEGGM 300 

50 Query: 299 TTGQPLVVKGVMKPIPTLYKPLMSTOIDSHEPYKATVERSDPTALPAAGVIMFJSIWATVL 358 

T G P+W+GVMKPIPTLYKPL SVDI + EP+ A++ERSD A+PAA V+ E WA + 
Sbjct: 301 TNGMPIVVRGVMKPIPTLYKPLQSVDIATKEPFAASIERSDSCAVPAAAWAEAWAWEV 360 

Query: 359 AKE I LETFSSTTMSELQKAFSDYRAYVKQF 388 
55 A +LE F + + E++K ++ + F 

Sbjct: 361 ANALLERFGADQVEEIEKNIREFNEKARLF 390 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 284/388 (73%) , Positives = 333/388 (85%) 



Query: 1 MRYLTAGESHGPSLTAIIEGIPAGLKLSAKDINEDLKRRQGGYGRGNRMKIETDQVIISS 60 

+RYLTAGESHGPSLTAIIEGIPAGL L DI+ +L+RRQGGYGRG RM IETD+V ISS 
Sbjct: 1 LRYLTAGESHGPSLTAIIEGIPAGLTLHPADIDHELQRRQGGYGRGARMSIETDRVQISS 60 
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Query: 61 GVRHGKTLGSPITLTVTNKDHSKWLDIMSVEDIEERLKQKRRIKHPRPGHADLVGGIKYR 120 

GVRHGKT G+PITLTV NKDH KWLD+M+V DIEE LK KRR+KHPRPGHADLVGGIKY 
Sbjct: 61 GVRHGKTTGAPITLTVINKDHQKWLDVMAVGDIEETLKLKRRVKHPRPGHADLVGGIKYH 120 

Query: 121 FDDLRNALERSSARETTMRVAIGAIAKRILKEIGIEIANHIWFGGI<EITVPDKLTVQQI 180 

F+DLR+ALERSSARETTMRVA+GA+AKRIL E+GI++ +HI++FGG IT+P KL+ +++ 
Sbjct: 121 FNDLRDALERSSARETTMRVAVGAVAKRILAELGIDMLHHILIFGGITITIPSKLSFREL 180 

Query: 181 KVLSSQSQVAIVNPSFEQEIKDYIDSVKKAGDTIGGWETIVGGVPVGLGSYVHWDRKLD 240 

+ + S+++IVNP E+EIK YID +KK GDTIGG++ETIV GVP GLGSYV WD+KLD 
Sbjct: 181 QERALHSELSIWPKQEEEIKTYIDKIKKEGDTIGGIIETIVQGVPAGLGSWQWDKKLD 240 

Query: 241 AKIAQAWSINAFKGVEFGLGFKSGFLKGSQVIWDSISWTKDQGYIRQSNNLGGFEGGMTN 300 

AK+AQAV+ S INAFKGVEFG GF GF KGSQVMD I+WT QGY RQ+N+LGGFEGGMT 
Sbjct: 241 AKIAQAVLSINAFKGVEFGAGFDMGFQKGSQVMDEITWTPTQGYGRQTNHLGGFEGGMTT 300 

Query: 301 GEPI I VRGVMKPI PTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGVVMEAWATVLVT 360 

G+P++V+GVMKPIPTLYKPLMSVDID+HEPY+ATVERSDPTALPAAGV+ME WATVL 
Sbjct: 301 GQPLVVKGVMKPIPTLYKPLMSVDIDSHEPYKATVERSDPTALPAAGVIMENVVATVIAK 360 

Query: 361 EVLEKFSSDNMYELKEAVKLYRNYVDHF 388 

E+LE FSS M EL++A YR YV F 
Sbjct: 361 EILETFSSTTMSELQKAFSDYRAYVKQF 388 



A related GBS gene <SEQ ID 8617> and protein <SEQ ID 8618> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -2.42 
GvH: Signal Score (-7.5): -3.23 

Possible site: 15 
>>> Seems to have no N-terrainal signal sequence 
ALOM program count: 1 value: -4.67 threshold: 



INTEGRAL Likelihood = • 
PERIPHERAL Likelihood = 
modified ALOM score: 1.43 

*** Reasoning Step: 3 



4.67 
0.69 



Transmembrane 

214 



0.0 

343 - 



359 ( 341 - 364) 



Final Results 

bacterial membrane -■ 

bacterial outside -• 

bacterial cytoplasm -- 



- Certainty=0 . 2869 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

57.7/73.8% over 354aa 

Bacillus subtilis 

EGAD|20299| chorismate synthase Insert characterized 

SP|P31104|AROC_BACSU CHORISMATE SYNTHASE (EC 4.6.1.4) ( 5 -ENOLPYRUVYLSHIKIMATE-3 -PHOSPHATE 
PHOSPHOLYASE) 

(VEGETATIVE PROTEIN 216) (VEG216) . Edit characterized 
GP| 143806 |gb|AAA20859.1 | |M80245 AroF Insert characterized 
Gpj 2634689 |emb|CAB14187.l| | Z99115 chorismate synthase Insert characterized 
PIR| C69590 ] C69590 chorismate synthase aroF - Insert characterized 

ORF00121(301 - 1359 of 1719) 

EGAD|20299|BS2267(1 - 355 of 368) chorismate synthase {Bacillus 

Subtilis}SP|P31104|AROC_BACSU CHORISMATE SYNTHASE (EC 4.6.1.4) ( 5 -ENOLPYRUVYLSHIKIMATE- 3 - 
PHOSPHATE PHOSPHOLYASE) (VEGETATIVE PROTEIN 216) (VEG216) .GP | 143806 | gb |AAA20859 . 1 | | M80245 
AroF {Bacillus subtilis}GP | 2634689 | emb| CAB14187.1 | | Z99115 chorismate synthase {Bacillus 
subtilis}PIR|C69590|C69590 chorismate synthase aroF - Bacillus subtilis 
%Match = 35.0 

%Identity =57.6 %Similarity =73.7 

Matches =204 Mismatches = 92 Conservative Sub.s = 57 



75 105 135 165 195 225 255 285 



WO 02/34771 



PCT/GB01/04789 



-692- 



IQLSRVAERKNLMPRGISQDIYNMCLKFGLPVHYJffiWDKDVLBTlILSHDKKASGQFIKIVILPQLGSATVHQIPLEEMRD 

315 345 375 405 435 465 495 525 

YLEK*MRYLTAGESHGPSLTAIIEGIPAGLIOjSAKDINEDLKRRQGGYGRGNRMKIETDQVIISSGVRHGKTLGSPITLT 
5 Illlllllllll II IIIMIII :: =111 =| III MM Ihll || | UNI =111111 I 

MRYLTAGESHGPQLTTIIEGVPAGLYITEEDINFELARRQKGHGRGRRMQIEKDQAKIMSGVRHARTLGSPIALV 
10 20 30 40 50 60 70 

555 609 639 669 699 729 759 

10 VTNKDHSKWLDIMSVEDI--EERLKQKRRIKHPRPGHADLVGGIKYRFDDLRNALERSSARETTMRVAIGAIAKRILKEI 

I I I I II I =1 = Ihl llllllll I III Ml lllllllllhlll MM h 

VEimiWKHWTKIMGAAPITEDEEKEMKRQISRPRPGHADLNGAIKYNHRDMRNVLERSSARETTVRVAAGAVA 

90 100 110 120 130 140 150 

15 789 819 849 879 909 939 969 999 

GIEIAiraiWFGGKEIWPDKLTVQQIKVLSSQSQVAIVNPSFEQEIKDYIDSVKKAGDTIGGWETIVGGVPVGLGSYV 
lh = l h: I = = = = == == =11 = = = = II I Ihllhll || hllhllll 

GIKVAGHVLQIGAVKAEKTGYTSIEDLQRVTEESPWCimEEAGKKMMAAIDFAKANGDSIGGIVEVlVEGM 

170 180 190 200 210 220 230 

20 

1029 1059 1089 1119 1149 1179 1209 1239 

HWDRKLDAKIAQAWS INAFKGVEFGLGFKSGFLKGSQVMDS I SWTKDQGYIRQSNNLGGFEGGMTNGEPI IVRGVMKPI 

hllllhhl MIIIIMIIMM Ihl I I | = = = || | =| 1 1 1 M 1 1 1 1 I Ihllllllll 

HYDRKLDSKLAAAVLSINAFKGWFGIGFFJUiGRNGSEVHDEIIWDEEKGYTRATNRLGGLEGGMTTGMPIVVRGVMKPI 
25 250 260 270 280 290 300 310 

1269 1299 1329 1359 1389 1419 1449 1479 

PTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGVVMEAVVATVLTC^ 

mini 1 1 ihi n= mini hiii ii ih i = 

30 PTLYKPLKSVDIETKEPFSASIERSDSCAVPAASWAEALSIjGKLQPSLNNSD 
330 340 350 360 

SEQ ID 8618 (GBS192) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 4; MW 44kDa). 

35 GBS192-His was purified as shown in Figure 196, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 608 

A DNA sequence (GBSx0648) was identified in S.agalactiae <SEQ ID 1885> which encodes the amino 
40 acid sequence <SEQ ID 1886>. This protein is predicted to be 3-dehydroquinate synthase (aroB). Analysis 
of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.82 Transmembrane 99 - 115 ( 98 - 116) 

45 



50 



Final Results 

bacterial membrane Certainty=0. 2529 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA18068 GB:D90911 3-dehydroquinate synthase [Synechocystis sp.] 
Identities = 138/351 (39%) , Positives = 200/351 (56%) , Gaps = 4/351 (1%) 

55 Query: 3 VEVTJLPJSIHPYHIKIEEGCFSFAGDWSHLWQKQMITIITDSNVEILYGESLVNQLKKQGF 62 

+ V LP PY ++I G + D ++ L +1 ++++ + YGE ++ L++ G+ 
Sbjct: 5 IPVPLPQSPYQVQIVPGGIiAAIADHIjAPLGLGKKIMVVSNPEIYDYYGEVVIQALQRAGY 64 
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Query: 63 TVHVFSFAAGEASKTLWiVNRIYAFLAKHHMTRSDGIIALGGGVVGDLAAFVASTYMRGI 122 

V AGE KTL N +Y + ++ R+ +++LGGGV+GD+ F A+T++RGI 

Sbjct: 65 EVFQHLI PAGETHKTLAS INELYDVAFQANLERNSTLLSLGGGVIGDMTGFGAATWLRGI 124 

5 Query: 123 HFLQI PTSLTAQVDSS IGGKTGVNTSFAKNMVGTFAQPDGVLIDPVTLKTLGNRELVEGM 182 

+F+Q+PTSL A VD+S IGGKTGVN KN++G F QP V IDPV LKTL RE GM 
Sbjct: 125 NFVQVPTSLJVyWDASIGGKTGVNHPQGKNLIGAFYQPRLVYIDPVVLKTLPEREFRAGM 184 

Query: 183 GEVIKYGLIDDIKLWHILEEMD- -GTIDSILDNALA-IIYHSCQVKRKHVLADQYDKGLR 239 
10 EVIKYG+I D +L+ LEE + +ID + D L II SCQ K V D+ + GLR 

Sbjct: 185 AEVIKYGVIWDSELFTALEEAEDLSSIDRLPDELLTKIIQRSCQAKVDWSQDEKEAGLR 244 

Query: 240 MHLNFGHTIGHAIEVHAGYGEIMHGFAVAIGMIQLSRVAERKNLMPRGISQDIYNMCLKF 299 
LN+GHT+GH +E GYG I HGEAVAIGM + + +A L + + + LK 

15 Sbjct: 245 AILNYGHTVGHGVESLTGYGVINHGEAVAIGMEAAAKIAHYLGLCDQSLGDRQRQLLLKT 304 

Query: 300 GLPVHY-AEWDKDVLFDILSHDKKASGQFIKIVILPQLGSATVHQIPLEEM 349 

LP + L L HDKK ++ ++ +G T+ +E+ 

Sbjct: 305 KLPTEMPPTLAVENLLASLLHDKKVKAGKVRFILPTAIGQVTISDAVTDEV 355 

20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1887> which encodes the amino acid 

sequence <SEQ ID 1888>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>» Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -0.43 Transmembrane 97 - 113 ( 97 - 114) 

Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA18068 GB:D90911 3 -dehydroquinate synthase [Synechocystis sp.] 
Identities = 123/349 (35%) , Positives = 190/349 (54%) , Gaps = 9/349 (2%) 

Query: 1 MPQTLHVHSRVKDYDILFTDHVLKTLADCLGERKQ-RKLLFITDQTVYHLYQTLFEEFAQ 59 

M T+ V Y + L +AD L +K++ +++ +Y Y + + Q 

Sbjct: 1 MATTIPVPLPQSPYQVQIVPGGLAAIADHLAPLGLGKKIMWSNPEIYDYYGEWIQALQ 60 

40 Query: 60 Q--YNAFVHVCPPGGQSKSLERVSAIYDQLIAENFSKKDMIVTIGGGVVGDLGGFVAATY 117 

+ Y F H+ P G K+L ++ +YD N + ++++GGGV+GD+ GF AAT+ 

Sbjct: 61 RAGYEVFQHLIPAGETHKTLASINELYDVAFQANLERNSTLLSLGGGVIGDMTGFGAATW 120 

Query: 118 YRGIPYIQIPTTLLSQVDSSIGGKVGVHFKGLTNMIGSIYPPEAIIISTTFLETLPQREF 177 
45 RGI ++Q+PT+LL+ VD+SIGGK GV+ N+IG+ Y P +1 L+TLP+REF 

Sbjct: 121 LRGINFVQVPTSLLAMVDASIGGKTGVNHPQGKNLIGAFYQPRLVYIDPWLKTLPEREF 180 

Query: 178 SCGISEMLKIGFIHDRPLFQQLRDFQ KETDKQGLERLIYQSI SNKKRTVEQDEFE 232 

G++E++K GIDLFL + + +L ++I +S K +V QDE E 

50 Sbjct: 181 RAGMAEVIKYGVIWDSELFTALEEAEDLSSIDRLPDELLTKIIQRSCQAKVDWSQDEKE 240 

Query: 233 NGLRMSLNFGHTLGHAIESLCHHDFYHHGEAIAIGMVVDAKLAVSKGLLPKEDLDSLLQV 292 

GLR LN+GHT+GH +ESL + +HGEA+AIGM AK+A GL + D Q+ 
Sbjct: 241 AGLRAIIjNYGHTVGHGVESLTGYGVINHGEAVAIGMEAAAKIAHYLGLCDQSLGDRQRQL 300 



35 



55 



Query: 293 FERYQLPTTLERADVSATSLFDVFKTDKKNSEQHIIFILPTETGFTTLA 341 

+ +LPT + ++ +L DKK + FILPT G T++ 

Sbjct: 301 LLKTKLPTEMP-PTIjAVENLIiASLLHDKKVKAGKVRFILPTAIGQVTIS 348 



60 An alignment of the GAS and GBS proteins is shown below: 

Identities = 121/332 (36%) , Positives = 182/332 (54%) , Gaps = 7/332 (2%) 



Query: 12 YHIKIEEGCFSFAGDWSHLWQKQMITIITDSSIvEILYGESLWQLKXQGFTVHVFSFAA 71 
Y I + D + Q++++ ITD V LY ++L + +Q + V 
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Sbjct: 14 YDILFTDHVLKTLADCLGERKQRKLL-FITDQTVYHLY-QTLFEEFAQQ-YNAFVHVCPP 70 

Query: 72 GEASKTLEVANRIYAFLAKHHMTRSDGI IALGGGWGDLAAFVASTYMRGIHFLQI PTSL 131 

G SK+LE + IY L + ++ D 1+ +GGGWGDL FVA+TY RGI ++QIPT+L 
Sbjct: 71 GGQSKSLERVSAIYDQLIAENFSKKDMIVTIGGGWGDLGGFVAATYYRGIPYIQIPTTL 130 

Query: 132 TAQVDSSIGGKTGVNTSFAKNMVGTFAQPDGVLIDPVTLKTLGNRELVEGMGEVIKYGLI 191 

+QVDSSIGGK GV+ NM+G+ P+ ++I L+TL RE G+ E++K G I 

Sbjct: 131 LSQVDSSIGGKVGVHFKGLTNMIGSIYPPEAII1STTFLETLPQREFSCGISEMLKIGFI 190 

Query: 192 DDIKLVmiLEEMDGTIDSILDNATAIIYHSCQVKRKHVLADQYDKGLRMHIMFGHTIGHA 251 

D L+ Ii 4- D +1Y S K++ V D+++ GLRM LNFGHT+GHA 

Sbjct: 191 HDRPLFQQLRDFQKETDK- - QGLERLI YQS I SNKKRI VEQDEFENGLRMSLNFGHTLGHA 248 

15 Query: 252 IEVHAGYGEIMHGEAVAIGMIQLSRVAERRNLMPRGISQDIYNMCLKFGLP--VHYAEWD 309 

IE + HGEA+AIGM+ +++A K L+P+ + + ++ LP + A+ 

Sbjct: 249 IESLCHHDFYHHGFA.IAIGMVVDAKLAVSKGLIjPKEDLDSLLQVFERYQLPTTLERADVS 308 

Query: 310 KDVLFDILSHDKKASGQFIKIVILPQLGSATV 341 
20 LFD+ DKK S Q I ++ + G T+ 

Sbjct: 309 ATSLFDVFKTDKKNSEQHIIFILPTETGFTTL 340 

SEQ ID 1886 (GBS336) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 2; MW 42.7kDa). It was also expressed in E.coli as a GST-fusion 
25 product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 5; MW 68kDa). 

The GBS336-GST fusion product was purified (Figure 209, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 310), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 609 

A DNA sequence (GBSx0649) was identified in S.agalactiae <SEQ ID 1889> which encodes the amino 
acid sequence <SEQ ID 1890>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3884 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9973> which encodes amino acid sequence <SEQ ID 9974> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAB14240 GB:Z99116 3-dehydroquinate dehydratase [Bacillus subtilis] 

Identities = 70/233 (30%) , Positives = 127/233 (54%) , Gaps = 12/233 (5%) 

Query: 2 KIWPVMPRSLEEA-QEIDLSKFDSVDIIEWRADALPK DDIINVAPAIFEKFAGHE 56 

KI++P+M ++ ++ E + K + DI+EWR D K + + + + + 
50 Sbjct: 17 KI I IPLMGKTEKQILNEAEAVKLLNPDI VEWRVDVFEKANDREAVTKLI SKLRKSLEDKL 76 

Query: 57 IIFTLRTTREGGNIVLSDAEYVELIQKINSIYNPDYIDFEYFSHKEVFQEMLEFPN 112 

+FT RT +EGG++ + ++ Y+ L++ + D ID E FS + ++ 

Sbjct- 77 FLFTFRTHKEGGSMEMDESSYLALLESAIQTKDIDLIDIELFSGDANVKALVSLAEENNV 136 

55 
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Query: 113 -LVLSYHNFQETP- -ENIMEIFSELTALAPRVVKIAWPKHEQDVLDVMNYTRGFKTINP 169 

+V+S H+F++TP +1+ ++ L + K+AVMP + D+L +++ T KTI 
Sbjct: 137 YWMSNHDFEKTPVKDEIISRLRKMQDLGAHIPKMAVMPMDTGDLLTLIiDATYTMKTIYA 196 

5 Query: 170 DQVYATVSMSKIGRI SRFAGDVTGS SWTFAYLDSS IAPGQITI SEMKRVKALL 222 

D+ T+SM+ G ISR +G+V GS+ TF + + APGQI +SE++ V +L 
Sbjct: 197 DRPIITMSMAATGLISRLSGEVFGSACTFGAGEEASAPGQIPVSELRSVLDIL 249 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1891> which encodes the amino acid 
10 sequence <SEQ ID 1892>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 3248 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

20 Identities = 160/225 (71%) , Positives = 198/225 (87%) 

Query: 1 MKIWPVMPRSLEEAQEIDLSKFDSVD1IEWRADALPKDDIINVAPAIFEKFAGHEIIFT 60 

M+IV PVMPR +EAQ ID+SK++ V++IEWRAD LPKD+I+ VAPAI FEKFAG EIIFT 
Sbjct: 1 MRIVAPVMPRHFDFAQAIDISKYEDVNLIEWRADFLPKDEIVAVAPAIFEKFAGKEIIFT 60 

25 

Query: 61 LRTTREGGNIVLSDAEYVELIQK1NSIYNPDYIDFEYFSHKEVFQEMLEFPNLVLSYHNF 120 

LRT +EGGNI LS EYV++I++IN+IYNPDYIDFEYF+HK VFQEML+FPNL+LSYHNF 
Sbjct: 61 LRWQEGGNITLSSQEYVDIIKEINAIYNPDYIDPEYFTHKSVFQEMLDFPNLILSYHNF 120 

30 Query: 121 QETPENIMEIFSELTALAPRVVKIAWPKNEQDvLDVMNYTRGFKTINPDQVYATVSMSK 180 

+ETPEN+ME FSE+T LAPRWK1AVMP++EQDVLD+MNYTRGFKT+NP+Q +AT+SM K 
Sbjct: 121 EETPENLMEAFSEMTKIAPRVVKIAVMPQSEQDVIjDIiMNYTRGFKTIiNPEQEFATISMGK 180 

Query: 181 IGRISRFAGDVTGSSWTFAYLDSSIAPGQITISEMKRVKALLDAD 225 
35 +GR+SRFAGDV GSSWT+ LD PGQ+T+++MKR+ +L+ D 

Sbjct: 181 LGRLSRFAGDVIGSSWTYVSLDHVSGPGQVTIiNDMKRIIEVLEMD 225 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 610 

A DNA sequence (GBSx0650) was identified in S.agalactiae <SEQ ID 1893> which encodes the amino 
acid sequence <SEQ ID 1894>. Analysis of this protein sequence reveals the following: 



45 



50 



Possible site: 17 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1195 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-696- 

Example 611 

A DNA sequence (GBSx0651) was identified in S.agalactiae <SEQ ID 1895> which encodes the amino 
acid sequence <SEQ ID 1896>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3431 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15862 GB:Z99123 alternate gene name: ipa-19d~similar to 
hypothetical proteins [Bacillus subtilis] 
15 Identities = 161/396 (40%) , Positives = 235/396 (58%) , Gaps = 11/396 (2%) 

Query: 1 MNKLKVNSWERKIKSGAQLLEKKDFDTSLVNQ LVQLFSQSN-QFLGMAYLSPQNK 55 

M L + KIK G L+EK+ S + LV + S+S +FL Y QNK 

Sbjct: 1 MKLLTLK1<AHAAKIKKGYPLIEKFALAGSAGHMKEGDLVDIVSESGGEFLARGYYGLQNK 60 

20 

Query: 56 GIGWLLSRQVFD-FNHDYFVSLFEKSREKRQKFEKSSQTTAYRLFNQDGDNFGGLTIDFY 114 

G+GW L+R + + +F+S K+ + R K ++ TTA+RLFN +GD GG+TID+Y 
Sbjct: 61 GVGWTLTRNKHEQIDQAFFLSKLTKAAQARAKLFEAQDTTAFRLFNGEGDGVGGVTIDYY 120 

25 Query: 115 SDYALFSWYM3FVYTNRQMIVAAFKQVYPNIKGAYEKIRFKGLDF ESAHLYGQEAPE 171 

Y L WY++ +YT + M+++A ++ + K YEK RF + + G+ 

Sbjct: 121 DGYLLIQWYSKGIYTFKDMLISALDEMDLDYKAIYEKKRFDTAGQYVEDDDFVKGRRGEF 180 

Query: 172 SFLILENNIKYSVFLNDGLMTGIFLDQHDVRKALATNLSEGKKVLNMFSYTARFSV 231 
30 +1 EN I+Y+V LN+G MTGIFHJQ VRKA+ ++GK VLN FSYT AFSVAAA+ 

Sbjct: 181 PIIIQENGIQYAVDLNEGAMTGIFLDQRHVRKAIRDRYAKBOVLNTFSOT 240 

Query: 232 GGALETTSVDLAKRSRELSKAHFDANQIVTDNHRFIVMDVFEYYKYAKRKHLSYDVIVID 291 
GGA +TTSVD+A RS + F N++ + H VMDVF Y+ YA +K h +D+I++D 
35 Sbjct: 241 GGAEKTTSVDVANRSLAKTIEQFSVNKLDYEAHDIKVMDVFNYFSYAAKKDLRFDLIILD 300 

Query: 292 PPSFARNKKQTFSVTKDYYKLIEQALDILTPGGTIIASTNAANLTVSQFKKQLEKGFGKA 351 

PPSFAR KK+TFS KDY L+++ + 1 G I+AS1N++ + +FK ++ F + 
Sbjct: 301 PPSFARTKKRTFSAAKDYKNLLKETIAITADKGVIVASTNSSAFGMKKFKGFIDAAFKET 360 

40 

Query: 352 SHNYISLQQ- -LPEDFTINDKDQQSNYLKVFTIKVK 385 

+ Y +++ LPEDF + NYLKV ++ K 

Sbjct: 361 NERYTIIEEFTLPEDFKTISAFPEGNYLKWLLQKK 396 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 1897> which encodes the amino acid 
sequence <SEQ ID 1898>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 2699 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 259/386 (67%) , Positives = 315/386 (81%) , Gaps = 1/386 (0%) 

Query: 1 1#JKLKVNSVVERKIKSGAQLLEKKDFDT-SLWQLVQLFSQSNQFLGMAYLSPQNKGIGW 59 
MNKL ++S VE+K+ +G QLL++KDF NQLVQL ++SN+ +G AY+S QNKGIGW 

60 Sbjct: 1 MNEOjYIDSFvEKKLTAGVQLLDEKDFSNIKEKNQLvQLVTKSNRPIGTAYISKQNKGIGW 60 
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Query: 


60 


LLSRQVFDFNHDYFVSLFEKSREKRQKFEKSSQTTAYRLENQDGDNFG6LTIDFYSDYAL 


119 






T i T"\ J VPTTCT TP i i VT3f~\ tl> i C ■ T 1 TKlTl i TJV'f"" 1 J.'T'TTl V fit 7\ i 

jj + u + lb v&hr ++ tfJiiJ J? t-o +1 AXRurNQ+ijU rtsEj-t-i. 1U i U-i-J\+ 




Sbjct: 


61 


YLGPEKIDLSISYFVSLFSVAKAKRQDFAQSDETNAYRLF'NQSGDGFGGVTIDLYKDFAV 


120 


Query: 


120 


FS WYNEFVYTNRQMI VAAFKQVYPNI KGAYEKIRFKGLDFESAHLYGQEAPES FL ILENN 


179 






FSWYN FVY ++MI + AF+QV+P +KGAYEK RFKG D E+AHLYG+ A E+F ILEN 




Sbjct: 


121 


FSWYNAFVYDKKEMIMEAFQQVFPEVKGAYEKCRFKGPDTETAHLYGELAQETFSILENG 


180 


Query: 


180 


I KYSVFLNDGLMTGI FLDQHDVRKAIATNLSEGKKVLNMFSYTAAFSVAAAVGGALETTS 


239 






I Y VFLN+GLMTGIFLDQHDVR+AL L+ GK +IM+FSYTAAFSVAAA+GGA+ETTS 




Sbjct: 


181 


IAYQVFIJSIEGLMTGIFLDQHDTORALVDGLAMGKSLIiNLFSYTAAFSVAAAMGGAIETTS 


240 


Query: 


240 


VDLAKRSRELSKAHFDANQIVTDNHRFIVMDVFEYYKYAKRKHLSYDVIVIDPPSFARNK 


299 






VDLAKRSRELS AHF+ NQ+ +H F+VMDVFEY+KYAKRK L +DVIVIDPPSFARNK 




Sbjct: 


241 


VDIAKRSRELSLAHFEHNQLNi^SHHFVVMDVFEYFKYAKRKKLIFDV'IV'IDPPSFARNK 


300 


Query: 


300 


KQTFSVTKDYYKLIEQALDILTPGGTIIASTNAANLTVSQFKKQLEKGFGKASHNYISLQ 


359 






KQTFSV++DY+KLI +ALDIL+P GTI IASTNAAN+TVSQFKKQ+ KGFG ++LQ 




Sbjct: 


301 


KQTFSVSRDYHKLITEALDILSPKGTI IASTNAANMTVSQFKKQI IKGFGSRRPESMTLQ 


360 


Query: 


360 


QLPEDFTINDKDQQSNYLKVFTIKVK 385 








QLP DFTIN D++SNYLKVFTIKV+ 




Sbj ct : 


361 


QLPSDFTINKADERSNYLKVFTIKVR 386 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 612 

A DNA sequence (GBSx0652) was identified in S.agalactiae <SEQ ID 1899> which encodes the amino 
acid sequence <SEQ ID 1900>. This protein is predicted to be minimal change nephritis transmembrane 
glycoprotein. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 3739 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12545 GB:Z99107 alternate gene name: yetP-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 299/676 (44%) , Positives = 415/676 (61%) , Gaps = 33/676 (4%) 

Query: 2 KKIKDFASRAINTRLGFILLLWIYWLKTIWAYHTDFNLGLENSYQLFLTIINPIPLGLL 61 

KK++ + + +L F +L V+++W KT +Y T+FNLG++ + Q L I NP + 
Sbjct: 9 KKVEVAMKKLFSYKLSFFVLAVILFWAKTYLSYKTEFNLGVKGTTQEILLIFNPFSSAVF 68 

Query: 62 IIGLALYVKRTKAFYITAFITYAIVNILLIANAIYYREFSDFITVSAVLASSKTSAGLGD 121 

+GLAL K K+ I I + ++ +L AN ++YR F DF+T + S +GD 
Sbjct: 69 FLGLALLAKGRKSAIIMLIIDF-LMTFVLYANILFYRFFDDFLTFPNIKQSGNVG-NMGD 126 

Query: 122 SALNLLRI WDL VYVFDFI ILI FLFATKKIHLDDRPFNKRAS FS ITALSGL - LFS INLFLA 180 

+++ D+ Y D IILI + + L + KR + S+ LSG+ LF INL A 
Sbjct: 127 GIFS IMAGHDI FYFLDI I ILIAVLIWRP -ELKEYKMKKRFA- SLVILSGIALFFINLHYA 184 



Query: 181 EIDRPELLSRGFS^r^YIVKALGLPSFSIYSGNQTYQAQKERNGATAQELATAKKYVAEHY 240 
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E DRP+LL+R F YIVK LGL +++IY G QT Q + +R A++ +L + + Y HY 
Sbjct: 185 EKDRPQLLTRTFDRNYIVKYLGLYNYTIYDGVC3TAQTETQRAYASSDDLTSVENYTTSHY 244 

Query: 241 AKPNPEYYGIGKGRNVIMIHLESFQQFLIDYKLNIDGKEHWTPFINSLYHSKETVS-FS 299 

AKPN EY+G KG+N+I IHLESFQ FLIDYKLN G+E VTPF+N L H E V+ F 
Sbjct: 245 AKPNAEYFGSAKGKNIIKIHLESFQS FLIDYKLN- -GEE--VTPFLNKLAHGGEDVTYFD 300 

Query: 300 NFFHQVKAGKTSDAETLMENSLFGLSSGSFMVNYGGENTQFAAPHILAQNGGYSSAVFHG 359 

NFFHQ GKTSDAE M+NS+FGL GS V GENT + P IB Q GY+SAV HG 
Sbjct: 301 NFFHQTGQGKTSDAELTMDNS I FGLPEGSAFVT- KGENTYQSLPAI LDQKEGYTSAVLHG 359 

Query: 360 OTGTFpmRmAYKQWGYDYFFDSSYFSKQTKDNSFQYGLNDKYMFADSIKYLEHMQQPFY 419 

+ +FWNR+ YK GYD FFD+S + + +N GL DK F +SI LE ++QPFY 
Sbjct: 360 DYKSFWNRDQIYKHIGYDKFFDASTYD-MSDFJWIlMGIjKDKPFFTESIPKLESLKQPFY 418 

Query: 420 TKFITOSNHYPYTSLKGESDEEGFPIAKTNDETINGYFATANYLDTALKSFFEYLKAAGV 479 

IT++NHYP+ + + A T D T++ YF TA YLD AL+ FF+ LK AG+ 

Sbjct: 419 AHLITLTNHYPFNL DEKDASLKKATTGDNTVDSYFQTARYLDEALEQFFKELKEAGL 475 

Query: 480 YDNS IIVMYGDHYGISNTRNPSLAELLGKDPETWSEYDNAMLQRVPYMIHI PGYSKGFI S 539 

YDNS+I++YGDH GIS N ++ E+LGK+ ++Y NA QRVP MI +PG KG ++ 
Sbjct: 476 YDNSVIMI YGDHNGI SENHNRAMKE I LGKE ITDYQNAQNQR VPLMIRVPG - KKGGVN 531 

Query: 540 NTYGGEVDNLPTLLHILGIDTSKYTQLGQDLLSKDNKQMVAMRTTGQYITPKYTNYSGHL 599 

+TYGGE+D +PTLLH+ GID+ KY G DL SKD+ VA R G ++TPKYT+ + 
Sbjct: 532 HTYGGEIDVMPTLLHLEGIDSQKYINFGTDLFSKDHDDTVAFR-NGDFVTPKYTSVDNII 590 

Query: 600 YYTDSGQEITNPDETTKAEIKAIRDATNKQLSTSDSIQTGDLLRFDENNGLKTVEVEKFN 659 

Y T +G+++ +ET K ++ N+QLS SDS+ DLLRF + N K V+ ++ 

, Sbjct: 591 YDTKTGEKLKANEET KNLKTRWQQLSLSDSVLYKDLLRFHKLNDFKAVDPSDYH 645 

Query: 660 YTHSLKALKAKERKLK 675 

Y KE+++K 
Sbjct: 646 Y -GKEKEIK 653 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1901> which encodes the amino acid 
sequence <SEQ ID 1902>. Analysis of this protein sequence reveals the following: 
Possible site: 48 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty= 0.3 73 9 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 533/713 (74%) , Positives = 603/713 (83%) 

Query: 1 MKKIKDFASRAINTRLGFILLLWIYWLKTIWAYHTDFNLGLENSYQLFLTIINPIPLGL 60 

+KK K + INTRLGFI+ L+ YW+KT+WAYHTDF+L L N YQ+FLTIINPIPL 
Sbjct: 16 VKKFKTLITGFINTRLGFIITLLFCYWIKTLWAYHTDFSLDLGNIYQVFLTIINPIPLAF 75 

Query: 61 LIIGLALYVKRTKAFYITAFITYAI vNILLIANAIYYREFSDFITVSAVLASSKTSAGLG 120 

L++G+ALYVK T+AFYI +++ Y I+NILLI+N+IYYREFSDFITVSA+LASSK SAGLG 
Sbjct: 76 LLLGVALYVKNTRAFYICSWVVYIILNILLISNSIYYREFSDFITVSAMLASSKVSAGLG 135 

Query: 121 DSALNLLRIWDLVYVFDFIILIFLFATKKIHLDDRPFNKRASFSITALSGLLFSINLFLA 180 

DSALNLLRIWD++Y+ DFIILI L KKI D RPFNKRA+F+ITALS LL SINLFLA 
Sbjct: 136 DSALNLLRIWDIIYILDFIILISLSIAKKIKNDQRPFNKRAAFAITALSSLLLSINLFLA 195 
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Query: 181 EIDRPELLSRGFSNTYIVKALGLPSFSIYS(^QTYC^QKERNC3ATAQEIATAKKYVAEHY 240 

EIDRPELL+RGFSNTYIV+ALGLP+F++YSGNQTYQAQKERNGATA+EL K YV HY 
Sbjct: 196 EIDRPELLTRGFSNTYIVRALGLPAFTLYSGNQTYQAQKERNGATAEELIDVKTYVKGHY 255 

5 Query: 241 AKPNPEYYGIGKGRNVIMIHIiESFQQFLIDYKIJSriDGKEHVVTPFINSLYHSKETVSFSN 300 

A P+P+Y+GIGKG+N+I++HLESFQQFLIDYKL KE+ VTPFINSLYHS T++F N 
Sbjct: 256 AAPDPQYFGIGKGKNIIVLHLESFQQFLIDYKLKEGDKEYEVTPFINSLYHSNATLAFPN 315 

Query: 301 FFHQVKAGKTSDAETLMENSLFGLSSGSFMVNYGGENTQFAAPH1LAQNGGYSSAVFHGN 360 
10 FFHQVTCAGKTSDAET+MENSLFGL+SGSFMVNYGGENTQFA P ILAQ GGY+ SAVFHGN 

Sbjct: 316 FFHQVKAGKTSDAETMMEMSLFGLNSGSFMVNYGGENTQFATPSIIiAQKGGYTSAVFHGN 375 

Query: 361 VGTFWNRNNAYKQWGYDYFFDSSYFSKQTKDNSFQYGLNDKYMFADSIKYLEHMQQPFYT 420 
VGTFWNKNNAYKQWGY+YFFDSSYFSKQ NSFQYGLNDKYMF DSIKYLE MQQPFYT 
15 Sbjct: 376 VGTFVmRl^AYKQWGYNYFFDSSYFSKQNSKNSFQYGLNDKYMFKDSIKYLEQMQQPFYT 435 

Query: 421 KFITVSMHYPYTSLKGESDEEGFPLAKTNDETINGYFATANYLDTALKSFFEYLKAAGVY 480 

KFITVSNHYPYTSLKGES EEGFPLAKT+DETINGYFATANYLD ALKSFF+YLKA G+Y 
Sbjct: 436 KFITVSNHYPYTSLKGESSEEGFPIAKTDDETINGYFATANYLDAALKSFFDYLKATGLY 495 

20 

Query: 481 DNSIIVMYGDHYGISNTRNPSIAELLGKDPETWSEYDNAMLQRVPYMIHIPGYSKGFISN 540 

DNSI V+ YGDHYGI SN+RN SLA LLGKD ETWSEYDNAMLQRVPYMIHTPGY+ G I 
Sbjct: 496 DNSIFVLYGDHYGISNSRNSSLAPLLGKDSETWSEYDNAMLQRVPYMIHIPGYTNGSIKE 555 

25 Query: 541 TYGGEVDNLPTLLHILGIDTSKYTQLGQDLLSKDNKQMVAMRTTGQYITPKYTNYSGHLY 600 

T+GGE+D LPTLLHILGIDTS++ QLGQDLLS N Q+VA RT+G Y+TP+YTNYSG LY 
Sbjct: 556 TFGGEIDALPTLLHILGIDTSQFVQLGQDLLSPQNSQIVAQRTSGTYMTPEYTNYSGRLY 615 

Query: 601 YTDSGQEITNPDETTKAEIKAIRDATNKQLSTSDSIQTGDLLRFDENNGLKTVEVEKFNY 660 
30 T +G EITNPDE T A+ K IR A +QL+ SD+IQTGDLLRFD NGLK ++ +F Y 

Sbjct: 616 NTQTGLEITNPDEMTIAKTKE1RSAVAQQLAASDAIQTGDLLRFDTQNGLKAIDPNQFIY 675 

Query: 661 THSLKALKAKERKLKDRSTSIYSKHNNKSTVDLFHAPSYLELQDPNKTHKTSK 713 
T LK LK KL STS+YSK+ +KST LF APSYLEL TS+ 
35 Sbjct: 676 TKQLKQLI031SAKLGSESTSLYSKNGHKSTQKLFKAPSYLEIiNPVEADAATSE 728 

A related GBS gene <SEQ ID 8619> and protein <SEQ ID 8620> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 9 
40 McG: Discrim Score: 12.63 

GvH: Signal Score (-7.5): -2.99 

Possible site: 30 
»> Seems to have an uncleavable N-term signal seq 
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modified ALOM score: 1.87 

*** Reasoning Step: 3 

55 Final Results 

bacterial membrane Certainty=0. 3739 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the databases: 



45.2/63.1% over 643aa 

EGAD | 107893 | hypothetical protein Insert characterized 
65 GP| 2116767 |dbj |BAA20118.1 1 |D86418 Yfnl Insert characterized 



Bacillus subtilis 
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GP|2633039|etnb|CAB12545.l| | Z99107 alternate gene name: yetP-similar to hypothetical 
proteins Insert characterized 

PIR|D69815 |D69815 conserved hypothetical protein yfnl - Insert characterized 

ORF00125(286 - 2280 of 2742) 

EGAD]l078l|| S0726(3 - 646 of 653) hypothetical protein { acillus subtilis} GP| 2116767 | dbj | 
AA20118.l| |D86418 Yfnl { acillus subtilis} GP | 2633039 | emb| CA 12545 . 1 | | Z99107 alternate gene 
name: yetP-similar to hypothetical proteins { acillus subtilis} PIR]D69815 |D69815 conserved 
hypothetical protein yfnl - acillus subtilis 
%Match =28.5 

%Identity =45.1 %Similarity =63.1 

Matches = 297 Mismatches = 227 Conservative Sub.s = 118 

36 66 96 126 156 186 216 246 

FVVKDRPSLRIDLTVKKVEPTG*LNWYQNLFFPVTEHIiI*FFFQRQNSL*VYS*TVL*QI 

276 306 336 366 396 426 456 486 

II*SEILSLGKKliKEVPTvTCKIKDFASRAINTRLGFILLIiVVIYWLKTIWAYHTDFNLGLENSYQLFLTIINPIPLGLLI 




10 20 30 40 50 60 



516 546 576 606 636 666 696 726 

IGIALYVKRTKAFYITAFITYAITOILLIANAIYYREFSDFIWSAVIiASSKTSAGLGDSAIJ^LRIWDLVYVFDFIILI 



LGIJUiLAKGRKSAIIMLIIDF-LMTFVLYAMILFYRFFDDFLTFPNI-KQSGNVGNMGDGIFSIMAGHDIFYFLDIIILI 
80 90 100 110 120 130 140 



756 786 843 873 903 933 963 

FLFATKKIHLDDRPFNKRASFSITALSGL-LFSINLFLAEIDRPELLSRGFSNTYIVKALGLPSFSIYSGNQTYQAQKER 



-AVLITOPELW^KKR-FASLVILSGIALFFIlSmHYAEKDRPQLLTRTFDRNYIVKYLGLYNYTIYDGVQTAQTETQR 
160 170 180 190 200 210 220 



993 1023 1053 1083 1113 1143 1173 1200 

NGATAQEIATAKKYVAEHYAKPNPEYYGIGKGKOTIMIHLESFQQFLIDY 



AYASSDDLTSVENYTTSHYAKPNAEYFGSAKGKNIIKIHLESFQSFLIDYKLN GEEVTPFLNKLAHGGEDVTYFDN 

240 250 260 270 280 290 300 



1230 1260 1290 1320 1350 1380 1410 1440 

FFHQVKAGKTSDAETLMENSLFGLSSGSFMVOTGGEIWQFAAPHIIA 




320 330 340 350 360 370 380 



1470 1500 1530 1560 1590 1620 1650 1680 

DSSYFSKQTKDNSFQYGLNDKYMFADSIKYLEHMQQPFYTKFITVSNHYPYTSLKGESDEEGFPLAKTNDETINGYFATA 




- TYDMSDENVINMGLKDKPFFTES I PKLESLKQPFYAHLITLTNHYPF -NLD - EKDAS - LKKATTGDNTVDSYFQTA 



390 400 410 420 430 440 450 



1710 1740 1770 1800 1830 1860 1890 1920 

NYLDTALKSFFEYLKAAGVYDNSIIv^GDHYGISNTRNPSLAELLGKDPETWSEYDNAMLQRVPYMIHIPGYSKGFISN 



RYLDEALEQFFKELKEAGLYDNSVIMIYGDHNGISENHNRAMKEILGK EITDYQNAQNQRVPLMIRVPG-KKGGVNH 

470 480 490 500 510 520 530 



1950 1980 2010 2040 2070 2100 2130 2160 

TYGGEVDNLPTLLHILGIDTSKYTQLGQDLLSKDNKQMVAMRTTGQYITPKYTNYSGHLYYTDSGQEITNPDETTKAEIK 



TYGGEIDVMPTLLHLEGIDSQKYINFGTDLFSKDHDinvaFR-NGDFVTPKYTSVDNIIYDTKTGEKL 
550 560 570 580 590 



600 



KANEETK 



2190 2220 2250 2280 2310 2340 2370 2400 

AIRDATNKQLSTSDSIQTGDLLRFDENNGLKTTO^ 




WO 02/34771 



-701- 



PCT/GB01/04789 



620 630 640 650 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 613 

A DNA sequence (GBSx0653) was identified in S.agalactiae <SEQ ID 1903> which encodes the amino 
acid sequence <SEQ ID 1904>. This protein is predicted to be 50S ribosomal protein L20 (rplT). Analysis 
of this protein sequence reveals the following: 

Possible site: 37 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3392 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9387> which encodes amino acid sequence <SEQ ID 9388> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAB14845 GB:Z99118 ribosomal protein L20 [Bacillus subtilis] 

Identities = 70/89 (78%) , Positives = 78/89 (86%) 

Query: 1 MFRTAKEQVIWSYYYAyRDRRQKKRDPRKLWITRINRAARMNGLSYSQLMHGLKLAEIEV 60 
+++ A +QVM S YA+RDRRQKKRDFRKLWITRINAAARMNGLSYS+LMHGLKL+ IEV 
25 Sbjct: 31 LYKVANQQVMKSGNYAFRDRRQKKRDFRKLWITRINAAftRMNGLSYSRLMHGLKLSGIEV 90 

Query: 61 NRKMLADLAVNDAAAFTALADAAKAKLGK 89 

NRKMLADLAVND AF LADAAKA+L K 
Sbjct: 91 NRKMLADLAVNDLTAFNQLADAAKAQLNK 119 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1905> which encodes the amino acid 
sequence <SEQ ID 1906>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood = -0.06 Transmembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane Certainty=0 . 1022 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 87/89 (97%) , Positives = 88/89 (98%) 

45 Query: 1 MFRTAKEQVIWSYYYAYRDRRQKKRDFRKLWITRINAAARMNGLSYSQLMHGLKLAEIEV 60 

+FRTAKEQvTTOSYYYAYRDRRQKKRDFRKLWITRINAAaRMNGLSYSQLMHGLKLAEIEV 
Sbjct: 31 LFRTAKEQVWSYYYAYRDRRQKKRDFRKLWITRINAAAR1WGLSYSQLMHGLKLAEIEV 90 

Query: 61 NRKMLADLAVNDAAAFTALADAAKAKLGK 89 
50 NRKMLADLAV DAAAFTALADAAKAKLGK 

Sbjct: 91 NRKMLADLAVADAAAFTALADAAKAKLGK 119 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 614 

A DNA sequence (GBSx0654) was identified in S.agalactiae <SEQ ID 1907> which encodes the amino 
acid sequence <SEQ ID 1908>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.64 Transmembrane 32 - 48 ( 32 - 48) 
INTEGRAL Likelihood = -0.32 Transmembrane 3 - 19 ( 3-19) 

Final Results 

bacterial membrane Certainty=0 . 1256 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 615 

A DNA sequence (GBSx0655) was identified in S.agalactiae <SEQ ID 1909> which encodes the amino 
acid sequence <SEQ ID 191 0>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 6052 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9349> which encodes amino acid sequence <SEQ ID 9350> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89436 GB:AE000977 A. fulgidus predicted coding region AF1820 
[Archaeoglobus fulgidus] 
Identities = 100/483 (20%) , Positives = 210/483 (42%) , Gaps = 61/483 (12%) 

Query: 351 LFPI ILYLVAALVTLTTMTRFVEEERTNAGILKALGYSDRQVI FKFI I YGFIAGTLGTTL 410 

LFP LV+ +T ++R + N +++ALG++ +++ ++ Y + G +T 
Sbjct: 276 LFPAFFILVSIFMTYALLSRIFRLQLGNIAVMRALGFTRNEIMLHYLQYPLLMGFFASTA 335 

Query: 411 GIIGGHYLLPRIISDIISKDLTIPNTQYHLFIjNYSLLAFWSLLSIVLPVFVI 463 

G++ G + + S 1+ L +P L L+ + L+ + F++ 

Sbjct: 336 GLVAGFFASQLLTSQYIT-FLNLPYYVSKPHLEVYSLSLMAGTLTPTISGFLVAYQASRV 394 

Query: 464 TRRELKEKAAFLLLPKPPAKGSKIALEYINWIWKKLSFTQKVTARNIFRYKQRMIM 519 

R E AA + + A S+I W ++ ++ RNIFR K+R + 
Sbjct: 395 DIVKALRGYAEVAAVSFIARIDALFSRI W RMRLIFRLALRNIFRSKRRTAI 445 

Query: 520 TIFGVAGSVALLFSGLGIQSSLKQTVNEHFGRIMPYDILLTYNTNASPPKILELLSKDSK 579 
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+IF + +L+ + + S + FG++ YDI ++ E+L K K 

Sbjct: 446 SIFSIVACTSLIIiNSMVFVDSFDYVMQLQFGKVYAYDIKVSLEGYDGK EVLEKVRK 501 

Query: 580 IDKY QPIHLENLDESIPGQINKQSISLFITDKKQLLPFIYLQEATTNKSLHL 631 

5 +D PI++E E++P +L I Q L +Y E + 

Sbjct: 502 MDGVLFAEPAVEMPIYVEKGGEAVP TLLIASNFQTLYNVYNAEG EKLI 549 

Query: 632 MNKGIIISKKLAQFYHVNTGDFIHL SHSQTLPSRKLKITGWNANVGHYIFMTK 685 

++GII SK + + G+ + + ++ + + V A++ 
10 Sbjct: 550 PSEGIIFSKTAMKNLSLVEGEKVSVYTEFGKLEAEVEDVEMIPLLSVATASL 601 

Query: 686 QYYRTIFKKEAKDNAFLVKLTKHKIANNLAEKLLEINGVESLTQNALQLASVEAVVRSLD 745 

Y+ I + N +V + +IA +AEK+ +++GV+ ++ S+E ++ 

Sbjct: 602 DYFSRISGVDG-FNRIWDADEGRIA-EIAEKIRQMDGVKKVSTVIEAQESIEELMGFFY 659 

15 

Query: 746 GSMTILVWSLLLAIVILYNLTNINLAERKRELSTIKVLGFYNEEVTLYIYRETIILSTI 805 

+ + + L ++N T+I++ ER REL+T+++LG+ + E+ + + E + ++ + 
Sbjct: 660 AF IAFSLFFGVSLGFAAVFNTTS I SVIERSRELATLRMLGYTSRE III SLI LENLFVAI L 719 

20 Query: 806 GVI 808 

G++ 

Sbjct: 720 GLV 722 

A related DNA sequence was identified in S.pyogenes <SEQ ID 191 1> which encodes the amino acid 
25 sequence <SEQ ID 1912>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
>» Seems to have no N- terminal signal sequence 

30 



INTEGRAL 


Likelihood 




-14. 


.33 


Transmembrane 


749 


- 765 


( 


739 - 


775) 


INTEGRAL 


Likelihood 




•10. 


.88 


Transmembrane 


845 


- 861 


( 


834 - 


865) 


INTEGRAL 


Likelihood 




-6. 


.64 


Transmembrane 


350 


- 366 


( 


344 - 


369) 


INTEGRAL 


Likelihood 




-6. 


.53 


Transmembrane 


22 


- 38 


( 


19 - 


42) 


INTEGRAL 


Likelihood 




-6. 


.32 


Transmembrane 


520 


- 536 


( 


515 - 


537) 


INTEGRAL 


Likelihood 




-4, 


.99 


Transmembrane 


446 


- 462 


( 


445 - 


465) 


INTEGRAL 


Likelihood 




-2 


.92 


Transmembrane 


396 


- 412 


( 


395 - 


413) 


INTEGRAL 


Likelihood 




-0. 


,80 


Transmembrane 


800 


- 816 


( 


800 - 


819) 



35 

Final Results 

bacterial membrane Certainty=0 . 6731 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB89436 GB:AE000977 A. fulgidus predicted coding region AF1820 
[Archaeoglobus fulgidus] 
45 Identities = 101/542 (18%) , Positives = 237/542 (43%) , Gaps = 42/542 (7%) 

Query: 350 IFPWLYLVAALVAFTTMTRYVDEERTSSGLLKAIGYSNKDISLKFLIYGLLASFLGTTL 409 

+FP LV+ + + ++R + + +++A+G++ +1 L +L Y LL F +T 
Sbjct: 276 LFPAFFILVSIFMTYALLSRIFRLQLGNIAVMRALGFTRNEIMLHYLQYPLLMGFFASTA 335 

50 

Query: 410 GIIGGTYLLSTLISEILTGA LTIGKTHLYSYWFYNGIAYLLAMLSAVLPAYLIVKKE 466 

G++ G+ LS++T +KHLY L+SLAY+ + 

Sbjct: 336 GLVAGFFASQLLTSQYITFLNLPYYVSKPHLEVYSLSLMAGTLTPTISGFLVAYQASRVD 395 

55 Query: 467 LFLN AAQLLLPKPPSKGAKIWLEHLTFVWKALSFTHKVTIRNIFRYKQRMLMT 519 

+ AA + + + ++IW L F ++ +RNIFR K+R ++ 

Sbjct: 396 IVKALRGYAEVAAVSFIARIDALFSRIWRMRLIF RLALRNIFRSKRRTAIS 446 

Query: 520 IVGVAGSVALLFAGLGIQSSLAKVVEHQFGDLTTYDILAVGSAKATATEQTDLASYLKQE 579 
60 I + +L+ + S V++ QFG + YDI + L Y +E 

Sbjct: 447 IFSIVACTSLILNSMVFVDSFDYVMQLQFGKVYAYDI KVSLEGYDGKE 494 

Query: 580 PITGYQKVSYASLTLPVKGLP DKQSISILSSS-ATSLSPYFNLLDSQEQKKVPIPTS 635 

+ +K+ P +P +K ++ + A++ +N+ +++ +K IP+ 

65 Sbjct: 495 VLEKVRKMDGVLFAEPAVEMPIYvEKGGEAVPTLLIASNFQTLYNVYNAEGEKL--IPSE 552 
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Query: 636 GVLISEK1ASYYKVKPGDQLVLTDRKGQSYKVTIKQVIDMTVGHYLIMSDTYFKNHFKGL 695 

G++ S+ + G+++ + G+ ++ ++ L+ T ++F + 

Sbjct: 553 GIIFSKTAMKNLSLVEGEKVSVYTEFGK LEAEVED VEMI PLLSVATASLDYFSRI 607 

5 

Query: 696 EAAPAYLIKVKDKDSKHIKETASDLLTLKAIRAVSQWVNHIKSVQLVVTSLNQVMTLLVF 755 

+ VDD IEA + + ++VS+ +S++ ++ + +F 

Sbjct: 608 SGVDGFMRIVVDADEGRIAEIAEKIRQMDGVKKVSTVIEAQESIEELMGFFYAFIAFSLF 667 

10 Query: 756 LSILLAIVILYNLTTINIAERIRELSTIKVLGFYDQEVTLYIYRETISLSLVGILLGIYL 815 

+ L ++N T+I++ ER REL+T+++LG+ +E+ + + E + ++++G++ + + 
Sbjct: 668 FGVSLGFAAVFNTTSISVIERSRELATLRMLGYTSREIIISLILENLFVAILGLVFALPI 727 

Query: 816 GKGLHTYIMTMISTGDIQFGVKVDAYOTLVPILVILSL1AVLGIWVNRHLKKVDMLEALK 875 
15 +++ ++++L+ +++ + + R + ++D+ + K 

Sbjct: 728 AYSTAYFFFSSFESELYYMPMVIYPRTFAATVLAVFAIILLALLPSARRVSEMDIAKVTK 787 

Query: 876 SI 877 
I 

20 Sbjct: 788 EI 789 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 377/857 (43%) , Positives = 543/857 (62%) , Gaps = 7/857 (0%) 

25 Query: 3 KTFWKDIYRSITTSKGRFSSILLLMMLGSFAFIGLKVSAPNMQRTAQNYLAHHHVMDITV 62 

KT WKDI R+I SKGRF S+ LM LGSFA +GLKV+ P+M+RTA YL H VMD+TV 
Sbjct: 4 KTLWKDILRAIKNSKGRFISLFFLMALGSFALVGLKVTGPDMERTASRYLERHQVMDLTV 63 

Query: 63 FNSWGLDKHDQTVLESLKGSQVEFSYFVOTTPQQNSKSYRLYSOTKTISTFDLVKGRLPL 122 
30 S + D+ L++LKG+ +E+ + +D + N KS RLYS K +S LVKG P 

Sbjct: 64 LASHQFSQADKQELDTLKGAHLEYGHLLDVSLTSNQKSLRLYSVPKKVSKPVLVKGSWPK 123 

Query: 123 NKSEIALSFQERKKYAIGDKINFKQDKNKLFSNTGPLTIVGFVNSTEIWSKTNLGSSQTG 182 
++++ LS K Y IGD++ L + T +VGF NS+E+WSK+NLGSS TG 

35 Sbjct: 124 RETDLVLSSSLAKNYQIGDELAVTSPMEGLLTTTH - FQWGFANSSEVWSKSNLGSSSTG 182 

Query: 183 DGDLDSYGVLDKTAFHSPVYTMARVTFKDLRLINPFSISYKEKVAKYQEKVSRKIjNIHNK 242 

DG h +Y ++ F S + + R+ F LRL N FS Y+++V + Q + L + + 
Sbjct: 183 DGSLYAYAFVNPNVFKS-AFNLLRIRFSHLRLTNAFSKDYQKRVTQNQAHLDNLLKDNGQ 241 

40 

Query: 243 IRYTKTKKESLRKIDEEEKSLLKAQKQINRLDNDSLAMPLSQRQAIQMKIKQDRLSLLKR 302 

RY ++ + +LK ++ ++ + SQ + +I+Q + +L K 

Sbjct: 242 KRYDDLQNQYDLALKNGRAAIAKETVKLAASEENLTFLEGSALQEAKHQIEQGKQAIAKE 301 

45 Query: 303 TKELLKLRHm , QIMESPQIIVYNRTTFPGGQGYNTFDSSTNSTSKISNLFPIILYLVAAL 362 

K+L +++ +E P + YNR+T PGG+GY+T+ +ST S S + N+FP++LYLVAAL 

Sbjct: 302 EKQLEQVQATKDKLEKPSYLTYNRSTLPGGEGYHTYATSTTSISNVGNIFPVVLYLVAAL 361 

Query: 363 VTLTTMTRFVEEERTNAGILKALGYSDRQVIFKFIIYGFIAGTLGTTLGIIGGHYLLPRI 422 
50 V TTMTR+V+EERT++G+LKA+GYS++ + KF+IYG +A LGTTLGIIGG YLL + 

Sbjct: 362 VAFTTMTRYVDEERTSSGLLKAIGYSNKDISLKFLIYGLIASFLGTTLGIIGGTYLLSTL 421 

Query: 423 ISDIISKDLTIPNTQYHLFLOTSLLAFVFSLLSIVLPVFVITRRELKEKAAFLLLPKPPA 482 
IS+I++ LTI T + + Y+ +A++ ++LS VLP ++I ++EL AA LLLPKPP+ 
55 Sbjct: 422 ISEILTGALTIGKTHLYSYWFYNGIAYLLAMLSAVLPAYLIVKKELFLNAAQLLLPKPPS 481 

Query: 483 KGSKIALEYIIWIWKKLSFTQKVTARNIFRYKQRMIMTIFGVAGSVALLFSGLGIQSSLK 542 

KG+KI LE++ ++WK LSFT KVT RNIFRYKQRM+MTI GVAGSVALLF+GLGIQSSL 
Sbjct: 482 KGAKIWLEHLTFVWKALSFTHKVTIRNIFRYKQRMLMTIVGVAGSVALLFAGLGIQSSLA 541 

60 

Query: 543 QTVNEHFGRIMPYDILLTYNTNASPPKILELLS - -KDSKIDKYQPIHLENLDESIPGQIN 600 

+ V FG + YDIL + A+ + +L S K I YQ + +L + G + 
Sbjct: 542 KVVEHQFGDLTTYDILAVGSAKATATEQTDLASYLKQEPITGYQKVSYASLTLPVKGLPD 601 

65 Query: 601 KQSISLFITDKKQLLPFIYLQEATTOKSIinj^GIIISKKIAQFYHVOTGDFIHL 660 

KQSIS+ + L P+ L ++ K + + G++IS+KLA +Y V GD + L+ + 
Sbjct: 602 KQSISILSSSATSLSPYFl^LDSQEQKKVPIPTSGVLISEKIASYYKVKPGDQLVLTDRK 661 



10 



40 



45 
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?SRKIiKITG VVNANVGHYI FMTKQYYRT I FKKEAKDNAFLVKL - - TKHKIANNLAEKL 718 
S K+ I V++ VGHY+ M+ Y++ FK A+L+K+ K A L 



L + + +++QN + SV+ W SL+ MT+LV +S+LIAIVILYNLT IN+AER REL 



STIKVLGFY++EVTLYIYRETI LS +G++LG G LH +M +1 + I FG KV 



Query: 


661 


Sb j ct : 


662 


Query: 


719 


Sb j ct : 


721 


Query: 


779 


Sbjct: 


781 


Query: 


839 


Sbjct: 


841 



15 +++PI V++ +L L 

Sbjct: 841 YVYLVPILV1LSLLAVL 857 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 616 

A DNA sequence (GBSx0656) was identified in S.agalactiae <SEQ ID 1913> which encodes the amino 
acid sequence <SEQ ID 1914>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 60 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2757 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < succ> 

30 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89431 GB:AE000977 ABC transporter, ATP-binding protein 
[Archaeoglobus fulgidus] 
35 Identities = 112/230 (48%), Positives = 167/230 (71%) 



Query: 


4 


Sbjct: 


2 


Query: 


64 


Sb j ct : 


62 


Query: 


124 


Sb j ct : 


122 


Query: 


184 


Sb j ct : 


182 



IEMKHSYKRYQTGETE 1 VANNDI S FS IERGEL WI LGASGAGKSTVLNI LGGMDSNSEGE 63 
+ ++ +K YQ G+ E+ A 1+ IERGE +V+LG SG GK+T+LNI +GG+D + G 
LRLEDWKVYQMGKVEVSALRGINLEIERGEFMVVLGPSGCGKTTMLNIIGGrDRPTRGR 61 



V+ DGK+I NY LT +RR +VGF+FQF+NL+P LTA ENVE+A+++V D + L 



+ VGL R HFPA+LSGGEQQRVAIARA+ K P ++L DEPTG+LD++TGK VL ++++ 



50 + + + T ++VTHN+A+A IA+RV+++ D K+ + N +P+D I++ 



There is also homology to SEQ ID 1354. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 617 

A DNA sequence (GBSx0657) was identified, in S.agalactiae <SEQ ID 1915> which encodes the amino 
acid sequence <SEQ ID 1916>. This protein is predicted to be DNA topoisomerase I (top A). Analysis of 
this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4716 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9821> which encodes amino acid sequence <SEQ ID 9822> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13485 GB.-Z99112 DNA topoisomerase I [Bacillus subtilis] 
Identities = 442/690 (64%) , Positives = 535/690 (77%) , Gaps = 10/690 (1%) 



Query: 


27 


LVIVESPAKAKTIEKYLGRNyKWASVGHIRDLKKSSMSIDFENNYEPQYINIRGKGPLI 


86 






LVIVESPAKAKTIE+YLG+ YKV AS+GH+RDL KS-M +D E N+EP+YI IRGKGP++ 




Sb j ct : 


5 






Query: 


87 


NDLKKEAKKAKKOTLASD 


146 






4-LK AKKAKKVYLA+DPDREGEAI + WHLAH LDLD RWFNE I TKDA+ K +F 




Sb j ct : 


65 


KELKTAAJCKAKrWY LAAD PDREGEAI 


124 


Query: 


147 


PRQIiWDLVDAQQJ^VLDRIVCT^ 


206 






PR IWMDLVnAQQARR+LDR+VGY ISPILWKKVKKGLSAGRVQSVAL+LIIDRE EI 




Sbjct: 


1 nr 






Query: 


207 


FQPEEYVfTIDGSFKKGTRKFNATFYGLDGKKFKIiSNNEDVKTVLKRIKTDEFLVEKVEKK 


266 






F+PEEYWTIDG+F KG F A+F+G +GKK L++ DVK +L ++K +++ VEKV KK 




Sbjct: 


185 


FKPEEYWTIDGTFLKGQETFEASFFGKNGKKLPLNSEADVKEILSQLKGNQYTVEKVTKK 


244 


Query: 


267 


ERRRNAPLPYTTSSLQQDAANKINFRTRKTMMIAQQLYEGLSLGTAGHQGLITYMRTDST 


326 






ER+RN LP+TTS+LQQ+AA K+NFR +KTMMIAQQLYEG+ LG G GLITYMRTDST 




Sbjct: 


245 


ERKRNPALPFTTSTLQQEAARKl^FRAKKTMMIAQQLYEGIDLGREGTVGLITYiMRTDST 


304 


Query: 


327 


RISPLAQNEATEFITNRFGANYSKHGNK-VKNASGAQDAHEAIRPSSVNHTPESIAKYLD 


385 






RIS A +EA FI +G + K K AQDAHEAI RP+ SV P + L 




Sb j ct : 


305 


RISNTAVDFAAR.FIDQTYGKEFLGGKRKPAKKNENAQDAHEAIRPTSVLRKPSELKAVLG 


364 


Query: 


386 


KDQLKLYTLIVmRFIASQMTAAVFDTMKVNLTQNGVTFIANGSQVKFDGYiMAVYND 


441 






+DQ++LY LIW RF+ASQM AV DTM V+LT NG+TF ANGS+VKF G+M VY + 




Sbjct: 


365 


RDQMRLYKLIWERFVASQl^PAVLDTMSVDLTNNGLTFRANGSIWKFSGFMKvYW 


424 


Query: 


442 


--TDKNKMLPDMEEGESWKWTNPEQHFTQPPARFSEASLIKTLEENGVGRPSTYAPTL 


499 






+K++MLPD++EG++V + PEQHFTQPP R++EA L+KTLEE G+GRPSTYAPTL 




Sbjct: 


425 


QMEEKDRMLPDLQEGDTVLSKDIEPEQHFTQPPPRYTEARLVKTLEERGIGRPSTYAPTL 


484 


Query: 


500 


ETIQKRYYVKIAAKRFEPTELGEIVNSLIVEFFPDIVDVTFTAEMEGKLDEVEIGKEQWQ 


559 






+TIQ+R YV L KRF PTELG+IV LI+EFFP+I++V FTA+ME LD VE G +W 




Sbj ct: 


485 


DTIQRRGWALDNKRFVPTELGQIVLDLIMEFFPEIINVEFTAKMERDIiDHVEEGNTEWV 


544 


Query: 


560 


KIIDEFYKPFEKEIAKAETEMEKIQIKDEPAGFDCELCGSPMVIKLGRYGKFYACSNFPE 


619 






KIID FY FEK + KAE+EM++++I+ E AG DCELC SPMV K+GRYGKF ACSNFP+ 




Sbj ct : 


545 


KIIDNFYTDFEKRVKKAESEMKEVEIEPEYAGEDCELCSSP1WYKMGRYGKFLACSNFPD 


604 


Query: 


620 


CHNTKAITKEIGVICPICQKGQVIERKTKRNRIFYGCDRYPECEFTSWDKPIGRTCPKSN 


679 




C NTK I K+IGV CP C +G ++ERK+K+ R+FYGCDRYP+CEF SVTOKPI R CPK 




Sbj ct : 


605 


CRNTKPIVKQIGWCPSCGEGNIVERKSKKKRVFYGCDRYPDCEFVSWDKPIERKCPKCG 


664 
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Query: 680 DFLVEKKVRGGGKQWCSNEKCDYQEEKIK 709 

LVEKK++ G QV C +CDY+EE K 
Sbjct: 665 KMLVEKKLK-KGIQVQC- -VECDYKEEPQK 691 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 1917> which encodes the amino acid 
sequence <SEQ ID 1918>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm — Certainty=0 . 5445 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below: 

Identities = 595/704 (84%), Positives = 656/704 (92%), Gaps = 1/704 (0%) 

Query: 6 TTTKTSTKKTSKKKSATAKKNLVIVESPAKAKTIEKYLGRNYKVVASVGHIRDLKKSSMS 65 
T KT TKK++ KK +TAKKNLVIVESPAKAKTIEKYLGR+YKWASVGHIRDLKKSSMS 
20 Sbjct: 7 TKPKTGTKKSTTKKKSTAKKNLVIVESPAKAKTIEKYLGRSYKVVASVGHIRDLKKSSMS 66 

Query: 66 IDFENNYEPQYINIRGKGPLINDLKKEAKKAKKVYLASDPDREGEAISWHLAHILDLDKE 125 

IDF+NNYEPQYINIRGKGPLIN LKKEAK AKKVYLASDPDREGEAISWHL+HIL LD + 
Sbjct: 67 IDFDNNYEPQYINIRGKGPLINSLKKEAKAAKKVYLASDPDREGEAISWHLSHILGLDPQ 126 

25 

Query: 126 DRNRWFNEITKDAVKNAFVEPRQINMDLVDAQQARRVLDRIVGYSISPILWKKVKKGLS 185 

D NRWFNEITKDAVK+AFVEPRQI+MDLVD+QQARRVLDRIVGYSISPILWKKVKKGLS 
Sbjct: 127 DNNRWFNEITKDAVKmFvEPRQIDMDLVDSQQARRVLDRIVGYSISPILWKKVKKGLS 186 

30 Query: 186 AGRVQSVALKXjIIDRENEIKAFQPEEYWTIDGSFKKBTRKFNATFYGLDGKKFKLSNNED 245 

AGRVQSVALKLIIDREN+IKAF P+EYW+IDG FKKGT+KF ATFYG++GKK KL NN D 
Sbjct: 187 AGRVQSVALKLIIDRENDIKAFVPKEYWSIDGLFKKGTKKFQj\TFYGINGKKTKLDNNND 246 

Query: 246 VKTVLKRIKTDEFLVEKVEKKERRRNAPLPYTTSSLQQDAANKINFRTRKTMMIAQQLYE 305 
35 VK VL ++ ++FLV KV+KKERRRNAPLPYTTSSLQQDAANKINFRTRKTMM+AQQLYE 

Sbjct: 247 VKEVIiAKLTNEDFLVSKOTKKERRRNAPIjPYTTSSLQQnAANKINFRTRKTt®WAQQLYE 306 

Query: 306 GLSLGTAGHQGLITYMRTDSTRISPLAQNEATEFITNRFGANYSKHGNKVKNASGAQDAH 365 
G+ LG G QGLITYMRTDSTRISP+AQN+A +FI NRFGANYSKHGN+VKN SG QDAH 
40 Sbjct: 307 GIHLGENGTQGLITYMRTDSTRISPVAQNDAAQFIINRFGANYSKHGNRVKNTSGVQDAH 366 

Query: 366 EA1RPSSVNHTPESIAKYLDKDQLKLYTLIWNRFIASQMTAAVFDTMKVNLTQNGVTFIA 425 

EAIRPSSVNHTP+SIAKyL+KDQLKLYTLIWNRF+ASQMTAAVFDT+ICVNL QNGV F+A 
Sbjct: 367 FAIRPSSvNHTPDSIAKYLNKDQLKLYTLIWNRFVASQMTAAVFDTVKA^^EQNGVIFVA 426 

45 

Query: 426 NGSQVKFDGYMAVYNDTDKNKMLPDMEEGESVKKVNTNPEQHFTQPPARFSEASLIKTLE 485 

NGSQ+KFDGYMAVYND+DKNKMLP+M EGE+VKK++T+PEQHFTQPPAR+SEA+LIKTLE 
Sbjct: 427 NGSQMKFDGYMAVYNDSDKNKMLPEMAEGETVKKISTSPEQHFTQPPARYSEATLIKTLE 486 

50 Query: 486 ENGVGRPSTYAPTLETIQKRYYVKIiAAKRFEPTELGEIvNSLIVEFFPDIVDVTFTAEME 545 

ENGVGRPSTYAPTLE IQ+RYYVKL+AKRFEPTELGEIVN LIVEFFPDIVDV FTAEME 
Sbjct: 487 ENGVGRPSTYAPTLEVIQRRYYVKLSAKRFEPTELGEIVNKLIVEFFPDIVDVAFTAEME 546 

Query: 546 GKLDEVEIGKEQWQKIIDEFYKPFEKELAKAETEMEKIQIKDEPAGFDCELCGSPMVIKL 605 
55 GKLD+VEIG+EQWQ +ID+FY+PF KEL KAE+E+EKIQIKDEPAGFDC++CG PMVIKL 

Sbjct: 547 GKLDQVEIGEEQWQHVIDQFYQPFVKEIiNKAESEIEKIQIKDEPAGFDCDVCGHPNIVIKL 606 

Query: 606 GRYGKFYACSNFPECHNTKAITKEIGVICPICQKGQVIERKTKRNRIFYGCDRYPECEFT 665 
GR+GKFYACSNFPEC NTKAITKEIGV CP+C RGQVTERKTK+NRIFYGCD+YP+CEF 
60 Sbjct: 607 GRFGKFYACSNFPECRNTKAITKEIGTCCPVCHKGQVTERKTKKNRIFYGCDQYPDCEFI 666 

Query: 666 SWDKPIGRTCPKSNDFLVEKKVRGGGKQWCSNEKCDYQEEKIK 709 

SWD PIGR CPKS D+L+EKKVR GGKQV+CSNE CDY+EEKIK 
Sbjct: 667 SWDLPIGRACPKSGDYLIEKKVR-GGKQVMCSNETCDYKEEKIK 709 

65 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 618 

A DNA sequence (GBSx0658) was identified in S.agalactiae <SEQ ID 1919> which encodes the amino 
5 acid sequence <SEQ ID 1920>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2578 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAD35341 GB:AE001708 DNA processing chain A [Thermotoga maritima] 

Identities = 97/231 (41%) , Positives = 149/231 (63%) , Gaps = 2/231 (0%) 

Query: 51 FIENYKQLDLKKLRQEFKKFPV- -LSILDSNYPLELKEIYNPPVLLFYQGNIELLSKPKL 108 
F+E + +L++ ++ +K V +S + +YP L+EI PP +LF +G+ ELL + + 
20 Sbjct: 41 FLEKCGKEELERQKELIRKHNVKLVSFWEDDYPQHLREIRYPPAVLFWGDAELLKEKCT 100 

Query: 109 AWGARQASQIGCQSVKKIIKETNNQFVIVSGLARGIDTAAHVSALKNGGSSIAVIGSGL 168 

WG R+ + G K+ +K + FVIVSG+A GID+ AH AL +GG ++AV+G+G+ 
Sbjct: 101 GWGTRRPTSYGVNVTKRFVKLLSEYEVIVSGMAFGIDSVAHKEALSSGGKTVAVLGTGV 160 

25 

Query: 169 DVYYPTENKKLQEYMSYNHLVLSEYFTGEQPLKFHFPERMIIIAGLCQGIVVAEAKMRSG 228 

DV YP N++L + N V+SEY G + K HFP RNRIIAGL I+V EA ++SG 
Sbjct: 161 DWYPRSNERLFHE I VKNGCWSEYPMGTRARKHHFPARNR1 IAGLSDAI I VTEAP I KSG 220 

30 Query: 229 SLITCERALEEGREVFAIPGNIIDGKSDGCHHLIQEGAKCIISGKDILSEY 279 

+LIT + ALE GR+VFA+PG+I S+G ++LI+ GA + +D+ + + 
Sbjct: 221 ALITVKFALESGRDVFAVPGDIDRKTSEGTNYLIKSGAYPLTDEEDLETHF 271 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1921> which encodes the amino acid 
35 sequence <SEQ ID 1922>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2856 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

45 Identities = 185/279 (66%), Positives = 238/279 (84%), Gaps = 1/279 (0%) 

Query: 1 MNHFELFKLKKAGLTOI^NIHNIINYLKKNSLTSLSVRNMAWSKCKNPTFFIENYKQLDL 60 

+NHFEL+KLKKAGLTN NI NI++Y +K+ SLS+R+MAWS CK+P+ FIE YKQLD+ 
Sbjct: 1 vNHFELYKLKKAGLTNKNILNILDY-QKHQEKSLSLRDMAWSGCKHPSHFIEAYKQLDI 59 



50 



Query: 61 KKLRQEFKKFPVLSILDSNYPLELKEIYNPPVLLFYQGNIELLSKPKLAWGARQASQIG 120 

+ L+ EFK+FP +SILD +YP+ LKEIYNPPVLLF+QGN++LL KPKLA+VG+R++S G 
Sbjct: 60 QNLKMEFKQFPSISILDKHYPMALKEIYNPPVLLFFQGNLDLLEKPKLAIVGSRRSSDTG 119 



55 Query: 121 CQSVKKIIKETIMQFVIVSGIjARGIDTAAHVSALKNGGSSIAVIGSGLDVYYPTENKKLQ 180 

+SV+KI+KE N+FVIVSGLARGIDT+AH++ LKNGG +IA+IG+GLD +YP EN++LQ 
. Sbjct: 120 VKSVRKILKELGNRFVIVSGLARGIDTSAHLACLKNGGQTIAI IGTGLDRFYPKENRELQ 179 



Query: 181 EYMSYNHLVLSEYFTGEQPLKFHFPERNRIIAGLCQGIWAEAKMRSGSLITCERALEEG 240 
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++ NHLVL+EY GE+ L +HFPERNRI IAGL +GI+V KAK RSGSLITC+ +EEG 
Sbjct: 180 TFLGKNHLVLTEYGPGEEALSYHFPERNRIIAGLSRGIL\A7EAKNRSGSLITCQIGIEEG 239 

Query: 241 REVFAIPGNIIDGKSDGCHHLIQEGAKCIISGKDILSEY 279 
5 R++FA+PGNI+DGKS+GC LI+EGA C+ SG DILSEY 

Sbjct: 240 RD I FAVPGNI LDGKSEGCLQLI KEGATCVTSGMDILSEY 278 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 619 

A DNA sequence (GBSx0659) was identified in S.agalactiae <SEQ ID 1923> which encodes the amino 
acid sequence <SEQ ID 1924>. This protein is predicted to be lipoprotein (ceuE). Analysis of this protein 
sequence reveals the following: 

Possible site: 24 
15 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certain ty=0. 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



25 



>GP:CAA06500 GB:AJ005352 lipoprotein [Staphylococcus aureus] 
Identities = 122/348 (35%) , Positives = 201/348 (57%) , Gaps = 16/348 (4%) 

Query: 1 MTKKLI IAILALCTILTTSQAVLAKEKSQ TVTIKNNYSVYIKKEKRDKPDNK 52 

M K ++ +LA+ +h KE+S+ TV 1+NNY + + EK+D D K 

Sbjct: 1 MKKTVLYL VIAVMFLliAACGNNSDKEQSKSETKGSKDTvKlENlSTy KM - - RGEKKDGSDAK 58 

30 Query: 53 KQISETLKVPLKPKKVWFDMGALDTITALGAEKSVIGIPKAKNALSLLPNNVKSVYKAK 112 

K + ET++VP P+ W D GALD + +G V +PK + SL PN ++S +K 
Sbjct: 59 K-VKETVBVPKNPENAVVLDYGALDVMKEMGLSDKVKALPKGEGGKSL-PNFLES-FKDD 115 

Query: 113 RYQDVGSLFEPNFEAIARMQPDWFLGARMASVDNIEKLKEAAPKAALVYAGVDSKKVFD 172 
35 +Y +VG+L E NF+ IA +P+V+F+ R A+ N+++ K+AAPKA +VY G D K + 

Sbjct: 116 KYTNVGNLKEVNFDKIAATKPEVIFISGRTANQKNLDEFKKAAPKAKIVYVGADEKNLIG 175 

Query: 173 KGVAERVTMLGKIFDQNKKAKTFNKDIAQAVLKLQKTIEKKGKPTALFVMANSGELLTQS 232 
+ + +GKI+D+ KAK NKD+ + ++ + K T ++++ N GEL T 
40 Sbjct: 176 S-MKQNTENIGKIYDKEVKAKEI^KDLDNKIASMKDKTKNFNK-TVMYLLVNEGELSTFG 233 

Query: 233 PSGRFGW-IFSVGGFKAVNENEKLSSHGTPVSYEYIAEKNPNYLFVLDRGATIGQGASSK 291 

P GRFG ++ GF AV++ S+HG VS EY+ ++NP+ + +DRG + +++K 
Sbjct: 234 PKGRFGGLVYDTLGFNAVDKKVSNSNHGQOTSNEYVNKENPDVILAMDRGQAVSGKSTAK 293 



45 



Query: 292 ELFNNDVIKATDAVKNKRVHEVDGKDWYINSGGSRVTLRMIKDVQNFV 339 

+ NN V+K A+K +V+ +D K WY +G + T++ V 
Sbjct: 294 QAIJ^PvXiKNVKAIKEDKVYNLDPKLWYFAAGSTTTTIKQIEELDKW 341 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1925> which encodes the amino acid 
sequence <SEQ ID 1926>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> May be a lipoprotein 

55 Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 57/255 (22%) , Positives = 104/255 (40%) , Gaps = 30/255 (11%) 

Query: 66 KKVWFDMGALDTITALGAEKSVIGIPKAKNALSLLPNNVKSVYKAKRYQDVGSLFEPNF 125 
5 +++V + +D L + ++G+ +K L LP +V + VG P+ 

Sbjct: 45 QRI VATS VAWDI CDRLNLD - - LVGVCDSK- -LYTLPKRYDAVKR VGLPMNPDI 94 

Query: 126 EAIARMQPDWFLGARmSVDNIEKLKEMPKAALVYAGVDSKKVFDKGVAERVTMLGKI 185 
E IA ++P + + EL+ K Y++ + V +G+ + + LG + 

10 Sbjct: 95 ELIASLKPTWILSPNSLQ EDLEPKYQKLDTEYGFLNLRSV- -EGMYQSIDDLGNL 147 

Query: 186 FDQNKKAKTFNKDIAQAVLKLQKT1EKKGKPTALFVMANSGELLTQSPSGRFGWIFSVGG 245 

F + ++AK + Q +KKPL+M GL+ G++G 

Sbjct: 148 FQRQQFAKELRQQYQDYYRAFQAKRKGKKKPKV7LILMGLPGSYLVATNQSYVGNLLDLRG 207 

15 

Query: 246 FKAV NENEKLSSHGTPVSYEYIAEKNPNYLFVLDRGATIGQGAS SKELFNNDVI 299 

+ V +E E LS++ E + K P+ +L I KE ND+ 

Sbjct: 208 GENVYQSDEKEFLSANP EDMLAKEPD--LILRTAHAIPDKVKVMFDKEFAENDIW 260 

20 Query: 300 KATDAVKNKRVHEVD 314 

K AVK +V+++D 
Sbjct: 261 KHFTAVKEGKVYDLD 275 

SEQ ID 1924 (GBS181) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 39 (lane 5; MW 38.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 3; MW 64kDa). 

The GBS181-GST fusion product was purified (Figure 204, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 299), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 620 

A DNA sequence (GBSx0660) was identified in S.agalactiae <SEQ ID 1927> which encodes the amino 
acid sequence <SEQ ID 1928>. This protein is predicted to be iron(III) ABC transporter, ATP-binding 
35 protein. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 3231 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAB12190 GB:Z99106 similar to ferrichrome ABC transporter 

(ATP-binding protein) [Bacillus subtilis] 
Identities = 125/247 (50%) , Positives = 187/247 (75%) 

Query: 1 MIQINNLHKFYGQKEILKDINISIPKGKVTAILGPNGSGKSTLLSCISRLEPYDNGEIFL 60 
50 M+++ N+ K YG K +L++ +++I KGK+T+ +GPNG+GKSTLLS +SRL D+GEI++ 

Sbjct: 1 MVEVRNVSKQYGGKVVTjEETSVTIQKGKITSFIGPNGAGKSTLLSIMSRLIKKDSGEIYI 60 

Query: 61 DKVPLAHYSSNDIAKTLA.ILRQSNHLTLKIKVRDLIGFGRFPYSKGRLSQKDKAVIESVI 120 
D + S +LAK ++IL+Q+N + +++ ++DL+ FGRFPYS+GRL+++D I + 
55 Sbjct: 61 DGQEIGACDSKELAKKMSILKQANQINIRLTIKDLVSFGRFPYSQGRLTEEDWVHINQAL 120 
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Query: 121 SYMDLNDIADEFINNLSGGQIQRAFIAMTMAQDTQYICLDEPLNNLDMKyAVQMMDLIKR 180 

SYM L DI D++++ LSGGQ QRAFIAM +AQDT YI LDEPLNNLDMK++V++M L+KR 
Sbjct: 121 SYMKLEDIQDKYLDQLSGGQCQRAFIAMVIAQDTDYIFLDEPL1OTLDMKHSVEIMKLLKR 180 

Query: 181 YAYEFNKTI VI I IHDINFATHYADNWALKEGQWTCGTVEDVMQEKILSHLFDMPIRIE 240 

E KTIVI+IHDINFA+ Y+D +VALK G++V G E++++ +L ++DM I 1+ 
Sbjct: 181 LVEELGKTIVIVIHDINFASVYSDYIVALKNGRIVKEGPPEEMIETSVLEEIYDMTIPIQ 240 

Query: 241 TVDGKPI 247 

T+D + I 
Sbjct: 241 TIDNQRI 247 

There is also homology to SEQ ID 1930. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 621 

A DNA sequence (GBSx0661) was identified in S.agalactiae <SEQ ID 1931> which encodes the amino 
acid sequence <SEQ ID 1932>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have a cleavable N- terra signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 6095 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12189 GB:Z99106 similar to ferrichrome ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 138/315 (43%) , Positives = 222/315 (69%) , Gaps = 6/315 (1%) 

Query: 9 KLLILLILLIAAIILFLIYGIPTDANEFLIIYILKTRYQKLIALILVGICIGSSSLIFQT 68 

K+ +L+ L I I LFL Y + Y L R +K+ A++L G I S++IFQT 

Sbjct: 6 KIALLVGLAIVCIGLFLFYDLGNWD YTLPRRIKKVAAIVLTGGAIAFSTMIFQT 59 

+TNNR+LTPSI+GLDSLY+LIQTG+++L G+ ++ + +F++S+LLM+ F+ +L+ I 
Sbjct: 60 ITNNRILTPSILGLDSLYMLIQTGIIFLFGSANMVTMNKNINFIISVLLMILFSLVLYQI 119 

Query: 129 LFRNKKQSLYFVLLAGLIFNTLFSSISSFIQAIMDPNDFMILQNQLFASFNAINTKILWI 188 

+F+ + ++++F+LL G++F TLFSS+SSF+Q ++DPN+F ++Q+++FASFN INT +LW+ 
Sbjct: 120 MFKGEGRNIFFLLLIGIVFGTLFSSLSSFMQMLIDPNEFQWQDKMFASFNNINTDLLWL 179 

Query: 189 SFIIIWSFVINWPFIKELDVLLLGKENAISLGISYQKLTTRFFLWLAL^IVAIATALVGP 248 

+FII +++ V W F K DVL LG+E+A++LGI Y K+ + + +A++V+++TALVGP 
Sbjct: 180 AFIIFLLTGVYVWRFTKFFDvLSLGREHAVNLGIDYDKVVKQMLIVVAILVSVSTALVGP 239 

Query: 249 ITFLGLLVAHITYHSFHTFRHQILVPIAIVICIFTLVLGQHLVQNLLHLTVQLSVLLNLI 308 

I FLGLLV ++ T++H L+ ++ I I LV GQ +V+ + + LSV++N 

Sbjct: 240 IMFLGLLVVNLAREFLKTYKHSYLIAGSVFISIIALVGGQFVVEKVFTFSTTLSVIINFA 299 



Query: 309 GGSYFIFTLIKGRKN 323 
GG YFI+ L+K K+ 
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Sbjct: 300 GGIYFIYLLLKENKS 314 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1933> which encodes the amino acid 
sequence <SEQ ID 1934>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 6456 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9175> which encodes the amino acid sequence 

<SEQ ID 9176>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 646 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/326 (24%) , Positives = 157/326 (47%) , Gaps = 34/326 (10%) 

Query: 10 LLILLILLIAAIILFLIYGIPTDANEFL IIYILKTRYQKLIALILVGICI 59 

+L++L LL A+I + G+ + + I R+ +++ +L G I 

Sbjct: 34 VLLILSLLFLAVIALSLGGLAVSYGAIVKGLFVAYDPQVALIYDLRFPRIVIALLAGAGI 93 

Query: 60 GSSSLIFQTLTNNRLLTPSIIGL DSLYILIQTGLMYLIGAQRVIKFSSFSSFL L 113 

S ++FQ + N + P+IIG+ S +L+ + L+ +++ + SFL + 

Sbjct: 94 AVSGVLFCAVLKNPISDPAI IGICSGASFMVLVSSLLL PQLLLYGPIVSFLGGGV 148 

Query: 114 SLLLMVGFAYLLFTILFRNKKQSLYFVLLAGLIFNTLFSSISSFIQAIMDPNDFMILQNQ 173 

S LL+ G A+ K + ++L G+ N LF +S+ + + M+ N 

Sbjct: 149 SFLLIYGLAW - -KKGLNPIRLILTGIAINALFMGLSTALTSFFTSASPMV- -NA 198 

Query: 174 LFASFNAINTKI-LWISFIIIWSFVINWPFIKELDVLLLGKENAISLGISYQKLTTRFF 232 

LA + T + + F +++ K ++LLL + LGI L 

Sbjct: 199 LlAGHISQKTWADVGvLFPYTFIGLLLALLLSKTOSttiLLLDDQVIRHLGIDATALRLGIS 258 



Query: 233 LWLALMVAIATALVGPITFLGLLVAHITYHSFHTFRHQILVPIAIVICIFTLVLGQHLVQ 292 
L L+ ++AT++VG ++FLGL+V H++ + +HQIL+P + ++ F +L L + 
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Sbjct: 259 LVAVLIASVATS IVGWSFLGLI VPHMSRLLVGS - KHQILI PFSALLGAFVFLLADTLGR 317 

Query: 293 NLLH-LTVQLSVT.LNLIGGSYFIFTL 317 
+L + L + +++++++GG YFI+ L 
5 Sbjct: 318 SLAYPLEISPAIIMSIVGGPYFIYLL 343 

A related DNA sequence was identified in S.pyogenes <SEQ ID 249 1> which encodes amino acid sequence 
<SEQ ID 2492>. An alignment of the GAS and GBS sequences follows: 

Score = 51.9 bits (122), Expect = 5e-08 
10 Identities = 73/327 (22%) , Positives = 137/327 (41%) , Gaps = 38/327 (11%) 

Query: 494 IISSLGTAISTVAQGIGTGLAIAFRGLGAAIAMVPPTTWLALGTAILMVGAAFALAGTQA 553 

+ 1 L T + G L IA +GA + +V A+ L++ A 

Sbjct: 573 VILGLVTTAVMMLLGAIAPLVIAIGAIGAPVGIWAAIVGAIAVITLIIQAIMNWGA--- 629 

15 

Query: 554 DGISQILRTIGDXXXXXXXXXTDSIATLLTIIANAIGSMLPIVAGAISQIVG A 606 

I++ L++ D ++ T T A + ++G S +V + 
Sbjct: 630 - - ITEWLQSTWDSCAAWXSELWTNI VTTAT TAWSNFTAWLSGLWSS VVSTGQSLWSS 684 

20 Query: 607 VAGGLSQLIIAVSTGVSLVIGAFTGLLGGI-SGVINSISAVIQSLTGVITAVFNGIATVI 665 

LS + ++ TG + +FT L + SG++++ S + +L+ 1+ +FNGI + 
Sbjct: 685 FTSSLSNIFSSLITGAQSLWSSFTSTLSNLWSGLVSTGSNLFNNLSSTISGIFNGILSTA 744 

Query: 666 SSVGSTIKDVLTGLGTAFEGFGNGVKSALEGVGAVIESFGSAVR NVLDGVAN 717 

25 S++ ++IK ++ A +G N V + GV A+ F ++ + G AN 

Sbjct: 745 SNIWNSIKSTIS NAIDGAKNAVSN GVNAIKNLFNFQ I KWPHI PLPHFRVSGSAN 798 

Query: 718 I LDSM- - GTAALNAGRGVKEMAKGI KMLVDL S LGDLVATLAAVASGLGKMASSAGEMTTL 775 
LD + G ++ G+ AKG ++ +L + A V G A +TL 
30 Sbjct: 799 PLDWLKGGLPSI GIDWYAKG-GIMTKPTLFGMNGNRAMVGGEAGAEAILPLNKSTL 853 

Query: 776 GSAMSKVANGMTRLATSAT1AITGLTV 802 

G+ +AN M + + + +G+T+ 
Sbjct: 854 GAIGQS I ANTM -NTSNNINVNFSGVTI 879 
35 Score = 33.2 bits (74), Expect = 0.019 

Identities = 83/477 (17%) , Positives = 175/477 (36%) , Gaps = 103/477 (21%) 

Query: 420 GSFLDKISTKFGLFGKKAKEGTD QAANGSRKSGGI I SQIFNGLGNI 465 

G + +++T+FGL G+K K ++ +A ++++ LG + 

40 Sbjct: 313 GDAVGEIiNTQFGLTGEKLKSASELLIKYAEINETDISSSAISAKQAIEAYGLTAEDLGMV 372 

Query: 466 VKSAGTAISTAAKGIGTGIKTALSGAPPIISSLGTAISTVA QGIGTGLAIA- 516 

+ + A + + T ++ A+ GAP I LG + A G+ + A++ 

Sbjct: 373 LDNVTKAAQDTGQSVDTIVQKAIDGAPQ-IKGLGLSFEEGAALIGKFEKSGVDSSAALSS 431 

45 

Query: 517 FRGLGAAIAMVPPTT- -WLALGTAILMVGAAFALAGTQA 553 

GL ++ + +T AL A + G+ A A 
Sbjct: 432 LSKAAVIYAKDGKTLTDGLNETVSAIQNSTSETEALSIASEIFGSKAAPRMVDAIQRGAF 491 

50 Query: 554 --DGISQILRTIGDXXXXXXXXXTDSLATLLTI IAMAIGSMLPIVAGAISQIV 604 

D +++ ++ D + L +A G +L V A+ ++ 

Sbjct: 492 SFDDLAFAAKSSSGTVSTTFDETLDPIDKLTQYSNQAKEGMAELGGKLLETVIPALEPLM 551 

Query: 605 GAVAGGLS QLII AVSTGVSIiVIGAFTGL LGGISGVINSISAVIQ 648 

55 G + ++ Q 1+ V+T V +++GA L +G I + + A I 

Sbjct: 552 GMLESSVNWFTSLNETDQQTIVILGLVTTAVMMLLGAIAPLVIAIGAIGAPVGIWAAIV 611 

Query: 649 SLTGVITAVFNGI ATVISSVGSTIKDVLTGLGTAFEGFGNGVK 691 

VIT +1 AS + + I T+F + +G+ 

60 Sbjct: 612 GAIAVITLIIQAIMNWGAITEV&QSTTOSCS^VKSELWraiVTTATTAWSNFTAWLSGLW 671 

Query: 692 SALEGVG-AVIESFGSAVRNV LDGVANILDSMGTAALNAGRGVKEMAKGIKMLVDL 746 

S++ G ++ SF S++ N+ + G ++ S + N G+ + 
Sbjct: 672 SSWSTGQSLWSSFTSSLSN1FSSLITGAQSLWSSFTSTLSNLWSGLVSTGSNL 725 



65 



Query: 747 SLGDLVATLAAVASGLGKMASSAGEMTTLGSAMSKWANGMTRLATSATIAITGLTVF 803 
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+L +T++ + +G+ +++++ ++ S +S +G ++ AI L F 

Sbjct: 726 -FNNLSSTISGIFNGI - -LSTASNIWNSIKSTISNAIDGAKNAVSNGVNAIKNLFNF 779 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 622 

A DNA sequence (GBSx0662) was identified in S.agalactiae <SEQ ID 1935> which encodes the amino 
acid sequence <SEQ ID 1936>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2277 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 623 

A DNA sequence (GBSx0663) was identified in S.agalactiae <SEQ ID 1937> which encodes the amino 

acid sequence <SEQ ID 1938>. This protein is predicted to be membrane protein (ceuB). Analysis of this 

protein sequence reveals the following: 

25 Possible site: 41 

>>> Seems to have no N-terminal signal sequence 
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-4. 


,83 


Transmembrane 


289 


- 305 


( 


287 


- 308) 


INTEGRAL 


Likelihood 




-4. 


.67 


Transmembrane 


24 


- 40 


( 


22 


- 46) 


INTEGRAL 


Likelihood 




-4 


.35 


Transmembrane 


69 


- 85 


( 


68 


- 86) 


INTEGRAL 


Likelihood 




-4 


.19 


Transmembrane 


200 


- 216 


( 


198 


- 216) 


INTEGRAL 


Likelihood 




-2 


.76 


Transmembrane 


107 


- 123 


( 


107 


- 123) 


INTEGRAL 


Likelihood 




-0 


.85 


Transmembrane 


258 


- 274 


( 


258 


- 274) 



Final Results 

bacterial membrane Certainty=0. 5522 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8621> which encodes amino acid sequence <SEQ ID 8622> 
was also identified. Analysis of this protein sequence reveals the following: 

45 Lipop: Possible site: -1 Crend: 2 

SRCFLG: 0 

McG: Length of IK: 23 

Peak Value of UR: 2.64 

Net Charge of CR: 2 
50 McG: Discrim Score: 8.59 

GvH: Signal Score (-7.5): -4.6 

Possible site: 26 
>>> Seems to have an uncleavable N-term signal seq 
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Amino Acid Composition: calculated from 1 



ALOM program count: 9 value: -11.30 threshold: 0.0 



INTEGRAL 


Likelihood 




11. 


,30 


Transmembrane 


226 - 


242 


( 


222 


- 259) 


INTEGRAL 


Likelihood 


_ 


-6. 


,42 


Transmembrane 


112 - 


128 


( 


103 


- 134) 


INTEGRAL 


Likelihood 




-5. 


,79 


Transmembrane 


137 - 


153 


( 


135 


- 159) 


INTEGRAL 


Likelihood 




-4. 


,67 


Transmembrane 


9 - 


25 


( 


7 


- 31) 


INTEGRAL 


Likelihood 




-4 


,35 


Transmembrane 


54 - 


70 


( 


53 


- 71) 


INTEGRAL 


Likelihood 




-4. 


,19 


Transmembrane 


185 - 


201 


( 


183 


- 201) 


INTEGRAL 


Likelihood 




-3. 


,08 


Transmembrane 


268 - 


284 


( 


265 


- 284) 


INTEGRAL 


Likelihood 




-2 


,76 


Transmembrane 


92 - 


108 


( 


92 


- 108) 


INTEGRAL 


Likelihood 




-0. 


.85 


Tr ansmemb r ane 


243 - 


259 


( 


243 


- 259) 


PERIPHERAL 


Likelihood 




5 


.73 


203 












modified ALOM 


score : 2 . 


,76 

















icml HYPID: 7 CFP: 0.552 



*** Reasoning Step: 3 

Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 5522 (Affirmative) < suco 
Certainty= 0.0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12188 GB:Z99106 similar to ferrichrome ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 149/304 (49%) , Positives = 234/304 (76%) 



Query: 


29 


LVILSLTSLFVGVKSIPLEQITHLDQSQVDIFLTSRLPRTISILISGASLSVCGLLMQQL 


88 






L+IL++TS+F+GV+ + + L + + SRLPR ISI+I+G S+S+CGL+MQQ+ 




Sbjct: 


10 


LI ILAVTSVFIGVEDLSPLDLFDLSKQEASTLFASRLPRLISI VIAGLSMSI CGLIMQQI 


69 


Query: 


89 


TQNKFVSPTTSGTMDWAKLGVVVTLIFFKNTSIFIQLCIASGFAILGSLLFVTILKMITF 


148 






++NKFVSPTT+GTMDWA+LG++++L+ F + S I++ +A FA+ G+ LF+ IL+ I F 




Sb j ct : 


70 


SRNKFVSPTTAGTMDWARLGILISLLLFTSASPLIKMLVAFVFALAGNFLFMKILERIKF 


129 


Query: 


149 


KDNIFIPLIGLMLGQIVAARTVFLGTHFQVLQSVNSWLQGNFSIMTSHRYEILYLALPCL 


208 






D IFIPL+GLMLG IV++ F+ + ++Q+V+SWLQG+FS++ RYE+LYL++P + 




Sb j ct : 


130 


NDTIFIPLVGLMLGNIVSSIATFIAYKYDLIQNVSSWLQGDFSLWKGRYELLYLSIPLV 


189 


Query: 


209 


FLVYFFAHQFTIVGLGESFAKNLGVAYEKMIYFGLVLVSIMTSLVIIIVGALPFLGLIVP 


268 






+ Y +A +FT+ G+GESF+ NLG+ Y++++ GL++VS++TSLVI+ VG LPFLGLI+P 




Sbj ct: 


190 


I IAYVYADKFTIAGMGES FSVNLGLKYKRWNIGLI I VSL ITSLVILTVGMLPFLGL IIP 


249 


Query: 


269 


NLISITKGDHMSSTILETSLLGACIVMICDLFGRLVIFPYEVSIGVTLGVLGSAFFLISI 


328 






N++SI +GD++ S++ T LLGA V+ CD+ GR++IFPYE+SIG+ +G++GS FL + 




Sbj ct : 


250 


NIVSIYRGDNLKSSLPHTVLLGAVFVLFCDILGRIIIFPYEISIGLMVGIIGSGIFLFML 


309 


Query: 


329 


IRNE 332 








+R + 




Sbj ct : 


310 


LRRK 313 





There is also homology to SEQ ID 1940. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 624 

A DNA sequence (GBSx0664) was identified in S.agalactiae <SEQ ID 1941> which encodes the amino 
acid sequence <SEQ ID 1942>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 140 - 156 ( 140 - 156) 
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Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06720 GB:AP001517 maltose transacetylase (maltose 
O-acetyltransf erase) [Bacillus halodurans] 
Identities = 93/182 (51%) , Positives = 125/182 (68%) , Gaps = 2/182 (1%) 

Query: 2 TEKEKMLAGQYYRPSAPELRKDREVALKNMQAFNN--EDNSSKRNVILQKWFGATGKSIH 59 

TEKEKMLAG+ Y+ PEL KDRE A + + FN E +R ++++ FG+ G+S++ 
Sbjct: 3 TEKEKMLAGERYKAWDPELVKDRERARRLTRLFNQTTETEEKQRTELIKELFGSMGESVN 62 

15 Query: 60 MEQRFVCDYGCNIYVGENFYANFNQTFLDVCEIRIGDNCMFGPNCQLLTPLHPLDPIERN 119 

+E F CDYG NI+VG NF+ANF+ LDVCE+RIG NCM P + T HP+ P+ER 
Sbjct: 63 IEPTFRCDYGYNIHVGNNFFANFDCVILDVCEVRIGANCMLAPGvHIYTATHPIHPLERV 122 

Query: 120 SGLEYGAPIQrGNNVWLG^VTILPGVVLGDNVWGAGSVVTKSFENNWIAGNPAKIIKKL 132 
20 G EYG P+ I NNVW+GG + PGV +G+N V+ +GSWTK NW+AGNPAK+ 1 + + 

Sbjct: 123 EGPEYGKPVTIRNNVWIGGRAIVNPGVTIGNNAVIASGSVVTKDVPENVVVAGNPAKVIQTI 184 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1943> which encodes the amino acid 

sequence <SEQ ID 1944>. Analysis of this protein sequence reveals the following: 

25 Possible site: 61 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4052 (Affirmative) < suco 
30 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/188 (36%) , Positives = 101/188 (53%) , Gaps = 13/188 (6%) 



35 



Query: 2 TEKEKMLAGQYYRPSAPELRKDREVALKNMQAFN NEDNSSKRNVILQKWFGA 53 

TE +KM G++Y + D E+ K M A + +R+ +L + FG 

Sbjct: 3 TEFDKMTRGEWY DANFDSELIQKRMMAQDLCFDLNQLKPSREEERSAVLNQLFGQ 57 



40 Query: 54 TGKSIHMEQRFVCDYGCNIYVGENFYANFNQTFLDVCEIRIGDNCMFGPNCQLLTPLHPL 113 

+ + + + F+CDYG NI G+N + N N F+D +1 +GDN GP+ T HPL 
Sbjct: 58 SFEGLVLLSPFICDYGKNITFGKNCFINSNCYFMDGAKIALGDNVFVGPSTGFYTANHPL 117 

Query: 114 DPIERNSGLEYGAPIQIGNNVWLGGGVTILPGVVLGDNVWGAGSVVTKSFENNVVIAGN 173 
45 D RN GLE PI IG+NVW G V ++PGV +G V+ +GSWT N + AG 

Sbjct: 118 DYKRRNEGLEKMiPITIGDNVWFGAN^ 177 

Query: 174 PAKIIKKL 181 
P ++++K+ 

50 Sbjct: 178 PCQWRKI 185 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 625 

55 A DNA sequence (GBSx0665) was identified in S.agalactiae <SEQ ID 1945> which encodes the amino 
acid sequence <SEQ ID 1946>. This protein is predicted to be ribonuclease H (rnhB-2). Analysis of this 
protein sequence reveals the following: 

Possible site: 32 
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>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 79 - 95 ( 79 - 95) 

Final Results 

bacterial membrane Certainty=0. 1065 (Affirmative) < succ> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9823> which encodes amino acid sequence <SEQ ID 9824> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13479 GB:Z99112 ribonuclease H [Bacillus subtilis] 
Identities = 128/249 (51%) , Positives = 168/249 (67%) 



Query: 


4 


TIKEIKAILETIVDLKDKRWQEYQTDSRAGVQKAILQRKKNIQSDLDEFARLEQMLVYEK 


63 






T+K+IK L+ + D+D + + DRVQ+QK + ++ M YE+ 




Sbjct: 


5 


TVKDIKDRLQEVKDAQDPFIAQCENDPRKSVQTLVEQWLKKQAKEKALKEQWVNMTSYER 


64 


Query: 


64 


KLYIEHINLIAGIDEVGRGPLAGPWAAAVILPPNCKIKHLNDSKKIPKKKHQEIYQNIL 


123 






+ LIAG+DEVGRGPLAGPWA+AVILP C+I L DSKK+ +KK +E Y+ 1 + 




Sb j ct : 


65 


LARNKGFRLIAGVDEVGRGPLAGPWASAVILPEECEILGLTDSKKLSEKKREEYYELIM 


124 


Query: 


124 


DQALAVGIGIQDSQCIDDINIYEATKHAMIDAVSHLSVAPEHLLIDAMVLDLSIPQTKII 


183 






+ALAVGIGI ++ ID+INIYEA+K AM+ A+ LS P++LL+DAM h L Q II 




Sb j ct : 


125 


KEALAVGIGIVEATVIDEINIYEASKMAMVKAIQDLSDTPDYLLVDAMTLPLDTAQASII 


184 


Query: 


184 


KGDANSLSIAAASIVAKVTRDKIMSDYDSTYPGYAFSKNAGYGTKEHLEGLQKYGITPIH 


243 






KGDA S+SIAA + +AKVTRD++MS Y TYP Y F KB GYGTKEHLE L YG T +H 




Sb j ct : 


185 


KGDAKSVSIAAGACIAKVTRDRMMSAYAETYPMYGFEKNKGYGTKEHLEALAAYGPTELH 


244 


Query: 


244 


RKSFEPIKS 252 








RK+F P++S 




Sb j ct : 


245 


RKTFAPVQS 253 





A related DNA sequence was identified in S. pyogenes <SEQ ID 1947> which encodes the amino acid 
sequence <SEQ ID 1948>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 79 - 95 ( 79 - 95) 

Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB13479 GB:Z99112 ribonuclease H [Bacillus subtilis] 
Identities = 130/252 (51%) , Positives = 176/252 (69%) , Gaps = 3/252 (1%) 

Query: 4 S I KAI KESLEAVTSLLDPLFQELATDTRSG VQKALKSRQKVI QAELAEEERLEAMLSYEK 63 

++K IK+ L+ V DP + D R VQ ++ K E A +E+ M SYE+ 
Sbjct: 5 TVKDIKDRLQEVKDAQDPFIAQCENDPRKSVCTLVEQWLKKQAKEKALKEQWVNMTSYER 64 

Query: 64 ALYKKGYKAIAGIDEVGRGPIAGPWAACVILPKYCKIKGLNDSKKIPKAKHETIYQAVK 123 

KG++ IAG+DEVGRGPLAGPWA+ VILP+ C+I GL DSKK+ + K E Y+ + 
Sbjct: 65 IiAi^KGFRLIAGVDEVGRGPIAGPVVASAVILPEECEIIiGLTDSKKLSEKKREEYYELIM 124 



Query: 124 EKALAIGIGIIDNQLIDEVNIYEATKLAMLEAIKQLEGQLTQPDYLLIDAMTLDIAISQQ 183 

++ALA+GIGI++ +IDE+NIYEA+K+AM++AI+ L PDYLL+DAMTL + +Q 

Sbjct: 125 KEAlAVGIGIv^TVIDEINIYEASKMAMVKAIQDLS---DTPDYLLVDAMTLPLDTAQA 181 
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Query: 184 S I LKGDANSLS I AAAS I VAKVTRDQMMANYDRI FPGYDFAKNAGYGTKEHLQGLKAYGIT 243 

SI+KGDA S+SIAA + +AKVTRD+MM+ Y +P Y F KM GYGTKEHL+ h AYG T 
Sbjct: 182 SIIKGDAKSVSIAAGACIAKVTRDRMMSAYAETYPMYGFEKNKGYGTKEHLEALAAYGPT 241 

5 Query: 244 PIHRKSFEPVKS 255 

+HRK+F PV+S 
Sbjct: 242 ELHRKTFAPVQS 253 

An alignment of the GAS and GBS proteins is shown below: 

10 Identities = 168/256 (65%) , Positives = 203/256 (78%) , Gaps = 3/256 (1%) 

Query: 1 MMATIKEIKAILETIVDLKDKRWQEYQTDSRAGVQKAILQRKKNIQSDLDEEARLEQMLV 60 

M +IK IK LE + L D +QE TD+R+GVQKA+ R+K IQ++L EE RLE ML 
Sbjct: 1 MPTSIKAIKESLEAVTSLLDPLFQELATDTRSGVQKALKSRQKVIQAELAEEERLEAMLS 60 



15 



Query: 61 YEKKLYIEHINLIAGIDEVGRGPI^GPWAAAVILPPNCKIKHLNDSKKIPKKKHQEIYQ 120 

YEK LY + IAGIDEVGRGPLAGPWAA VILP CKIK LNDSKKIPK KH+ IYQ 
Sbjct: 61 YEKALYKKGYKAIAGIDEVGRGPLAGPWAACVILPKYCKIKGIiNDSKKIPKAKHETIYQ 120 



20 Query: 121 NILDCALAVGIGIQDSQCIDDINIYEATKHAMIDAVSHLS VAPEHLLIDAMVLDLSI 177 

+ ++ALA+GIGI D+Q ID++NIYEATK AM++A+ L P++LLIDAM LD++I 

Sbjct: 121 AVKEKALAIGIGIIDNQLIDEWIYEATKLAMLEAIKQLEGQI/FQPDYLLIDAMTLDIAI 180 

Query: 178 PQTKIIKGDANSLSIAAASIVAKVTRDKIMSDYDSTYPGYAFSKMAGYGTKEHLEGLQKY 237 
25 Q I+KGDANSLSIAAASIVAKVTRD++M++YD +PGY F+KNAGYGTKEHL+GL+ Y 

Sbjct: 181 SQQS ILKGDANSLS IAAAS I VAKVTRDQMMANYDRI FPGYDFAKNAGYGTKEHLQGLKAY 240 

Query: 238 GITPIHRKSFEPIKSM 253 
GITPIHRKSFEP+KSM 
30 Sbjct: 241 GITPIHRKSFEPVKSM 256 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 626 

35 A DNA sequence (GBSx0666) was identified in S.agalactiae <SEQ ID 1949> which encodes the amino 
acid sequence <SEQ ID 1950>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1865 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 627 

50 A DNA sequence (GBSx0667) was identified in S.agalactiae <SEQ ID 1951> which encodes the amino 
acid sequence <SEQ ID 1952>. Analysis of this protein sequence reveals the following: 



Possible site: 14 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3 034 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06195 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 140/281 (49%) , Positives = 195/281 (68%) , Gaps = 5/281 (1%) 

10 Query: 3 TIQWFPGHMSKARRQVQENIKHVDFVTILVDARLPLSSQNPMLTKIVGDKPKIjMII^IKAD 62 

TIQWFPGHM+KARR+V E +K +D V L+DAR+PLSS+NPM+ +IV KP+L++I1NK D 
Sbjct: 2 TIQWFPGHMAKARREVTEKLKLIDWIELLDARVPLSSRNPMMDEIVAHKPRLVLLNKDD 61 

Query: 63 IADPIRTKEWRDFYESQGLKTLAINSKEQSTVKKVTDIAKILMSDKIANLRGRGIQKETIi 122 
15 LADP +TKEW F+E G L IN++ V +++ + L I R +G++ + 

Sbjct: 62 LADPSKTKEWTRFFEEGGATVLPINAQTGQGV'SRISPACQTLAQALIEKQRAKGMKPRAl 121 

Query: 123 RTMIIGIPNAGKSTLMNRLAGKKIAWGNKPGVTKGQQWLKSNKELEILDTPGILWPKFE 182 
R MI+GIPN GKSTL+NRLA K+IA VG++PG+TK QQW+K KELE+LDTPGILWPKF+ 
20 Sbjct: 122 RAMILGIPNVGKSTLINRLASKRIAKVGDRPGITKQQQWIKVGKELELLDTPGILWPKFD 181 

Query: 183 DELVGLKLALTGAI KDQLLPMDEVTI FGLNYFKTYYPDRLKERFKS INLEDEAPE I IMAL 242 

D+ G +LA TGAIKD+LL +V +F L Y + YPDRL +R+K L ++ + A+ 
Sbjct: 182 DQATGFR]^TGAlKDELLDFQDVALFVLRy^^^ffPDRLMDRYKLNELPEDGvTLFDAI 241 

25 

Query: 243 TQKLGY RDDYDRFYNLFWEVRDGKLGRYTIiDIVGE 278 

+K G+ DYD+ + ++E+R G LGR TL++ G+ 

Sbjct: 242 GKKRGHLLSGGYIDYDKTAEMILRELRAGTLGRITLEVPGK 282 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 1953> which encodes the amino acid 
sequence <SEQ ID 1954>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2688 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 247/282 (87%) , Positives = 265/282 (93%) 

Query: 1 mTIQWFPGHMSKARRQVQENIKHVDFVTILVDARLPLSSQNPMLTKIVGDKPKLMILNK 60 
MA IQWFPGHMSKARRQVQEN+KHVDFVTILvDARLPLSSQNPMLTKIVGDKPKLMILNK 
45 Sbjct: 1 MAMIQWFPGHMSKARRQVQENVKHVDFVTILVDARLPLSSQNPMLTKIVGDKPKLMILNK 60 

Query: 61 ADIADPIRTKEWRDFYESQGLKTIAINSKEQSTVKKOTDIAKILMSDKIANLRGRGIQKE 120 

ADLAD RTKEW+ +YESQG+KTLAINSKEQSTVKKVT+ AK LM+DKI LR RGIQKE 
Sbjct: 61 ADIiADATRTKEWKAYYESQGIKTIAINSKEQSTVKKVTEAAKELMADKIQRLRERGIQKE 120 

50 

Query: 121 TLRTMIIGIPNAGKSTLMNRLAGKKIAWGNKPGVTKGQQWLKSNKELEILDTPGILWPK 180 

TLRTMIIGIPNAGKSTLMNRLAGKKIAWGNKPGVTKGQQWLKSNKELEILDTPGILWPK 
Sbjct: 121 TLRTMIIGIPNAGKSTLMNRLAGKKIAWGNKPGVTKGQQWLKSNKELEILDTPGILWPK 180 

55 Query: 181 FEDEL VGLKLALTGAI KDQLLPMDEVTI FGLNYFKTYYPDRLKERFKS INLEDEAPE I IM 240 

FEDELVGLKLALTGAIKDQLLPMDEVTIFGLNYF+ YYP+RL +RFK+I LE+EAPEIIM 
Sbjct: 181 FEDELVGLKLALTGAIKDQLLPMDEVTIFGLNYFREYYPNRLTKRFKNIPLEEEAPEIIM 240 

Query: 241 ALTQKLGYRDDYDRFYNLFVKEVRDGKLGRYTLDIVGEHDGN 282 
60 LT++LG++DDYDRFY LFVKEVRDGKLG+YTLD VG+ D + 

Sbjct: 241 TLTRQLGFKDDYDRFYTLFVKEVRDGKLGQYTLDQVGDMDAD 282 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 628 

A DNA sequence (GBSx0668) was identified in S.agalactiae <SEQ ID 1955> which encodes the amino 
5 acid sequence <SEQ ID 1956>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

10 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Mot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9825> which encodes amino acid sequence <SEQ ID 9826> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12129 GB:Z99105 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 69/173 (39%) , Positives = 102/173 (58%) , Gaps = 13/173 (7%) 

20 Query: 29 DKAKEKASV IKQASQTSQTSKKEVLQKKT YPNLNKYSNLEIHVSSTRQTMT 79 

D A+E AS+ ++ + +T+K + K YP++ K ++ I V+ Q 

Sbjct: 22 DHAEEHASINTKKTVENITDTOKTAKTSIDOTKPSGGEYPDI-KQKHvWIDVNVKEQKAY 80 

Query: 80 ITSNDKVIFKTIVSTG AKESPTPKGTFVIEPERGDFFYNASSKEGAYYWVSFKEHGI 136 

25 I 1+ ++S+G K+ TPKGTF +EPERG++F++ +EGA YWVS+K HG 

Sbjct: 81 IKEGSNTIYTMMISSGLDQTKDDATPKGTFYVEPERGEWFFSEGYQEGAEYWVSWKNHGE 140 

Query: 137 YLFHSVPTDQQGNEIPEEAKQIX3KAASHGCVRMSRADAKWFYENIPQGTTVTI 189 
+LFHSVP + I EA++LG SHGC+R++ DAKW YENIP+ T V I 
30 Sbjct: 141 FLFHSVPMTKDQKVIKTEAEKLGTKVSHGCIRLTIPDAKWVYENIPEHTKWI 193 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1956 (GBS644) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 2 & 3; MW 49.6kDa) and in Figure 186 (lane 3; MW 50kDa). It was 
35 also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
130 (lane 5-7; MW 24.6kDa) and in Figure 177 (lane 3; MW 25kDa). 

GBS644-GST was purified as shown in Figure 236, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 629 

A DNA sequence (GBSx0669) was identified in S.agalactiae <SEQ ID 1957> which encodes the amino 
acid sequence <SEQ ID 1958>. This protein is predicted to be carbon starvation protein A. Analysis of this 
protein sequence reveals the following: 

Possible site: 19 
45 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.25 Transmembrane 129 - 145 ( 122 - 157) 

INTEGRAL Likelihood = -9.92 Transmembrane 316 - 332 ( 305 - 342) 

INTEGRAL Likelihood = -6.42 Transmembrane 164 - 180 ( 157 - 181) 

INTEGRAL Likelihood = -5.73 Transmembrane 443 - 459 ( 441 - 466) 
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INTEGRAL 


Likelihood 




-5. 


57 


Transmembrane 


416 
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Final Results 

10 bacterial membrane Certainty=0. 55 01 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAF93852 GB:AE004154 carbon starvation protein A, putative 

[Vibrio cholerae] 

Identities = 220/470 (46%) , Positives = 311/470 (65%) , Gaps = 16/470 (3%) 

Query: 1 MVTFLGGVALLIVGYFTYGRYIEKNFQIDENRQTPAEALRDGYDFVPMP'KWKNGMIELLN 60 
20 M+ FL VA L+ GYF YG ++EK F I+E RQTPA DG D+VPM K +++LLN 

Sbjct: 1 MLWFLTCVAALVGGYFIYGAFVEKVFGINEKRQTPAHTKTDGVDYVPMSTPKVYLVQLLN 60 

Query: 61 IAGTGPIFGPILGALYGPVAYIWIVLGCIFAGAVHDYMIGMISLRNNGAYLPELASRYLG 120 
IAG GPIFGPI+GALYGP A +WIV+GCIFAGAVHDY GM+S+RN GA +P + RYLG 
25 Sbjct: 61 IAGVGP I FGP IMGALYGPAAMLWI WGCI FAGAVHDYFSGMLS IRNGGASVPS I TGRYLG 120 

Query: 121 KSMKHVINIFSMLLLILVATVFWTPANLILSIIjPAG TLSLPWIIGLIFVYYLISTV 177 

KH +NIF+++LL+LV VFV PA +1 +++ T+S+ ++ +IF YY+++T+ 

Sbjct: 121 NGAKHF^IFAIVLLLLVGWE^SAPAGMITNLINQQTDFTVSMTTMVVIIFAYYILATI 180 

30 

Query: 178 LPIDKALGKVYPVF -CVILMVSTAAVGFRLLTGGFDMPNLTFETFKNMHPAGLG 230 

+P+DK +G+ YP+F V LM + A + GGF++ ++ KN++P + 

Sbjct: 181 TOVIJKIIGRFYPLFGALLIFMSVGIjMTAIAFSSEHQVLGGFEISDMV KNLNPNDMP 236 

35 , Query: 231 IFPALFFTISCGAISGFHATQAPMVSRTTVNEREGRFTFYGMMIAEGVIAMIWAGASMSL 290 
++PALF TI +CGAI SGFHATQ+P+++R NE+ GRF FYG MI EG+IA+IW ++S 
Sbjct: 237 LWPALFITIACGAISGFHATQSPLMARCMHffiKNGRFVFYGAMIGEGIIALIWCTVALSF 296 

Query: 291 FKG-QNLYEMIAAGTPSAWNQVMLMLLGSVIGTIAIIGVIVLPVSSGLSAFRSLRTIVA 349 
40 F+LE+GPW LLG G IA +GV++LP++SG +AFRS R I+A 

Sbjct: 297 FGSLEALSEAVKNGGPGNWYGASFGLLGVFGGVIAFLGWILPITSGDTAFRSSRLILA 356 

Query: 350 DYIHVKQDTLPKI FA VT I PLYVT S FVLTHVDFNLLWRYFNWANQ VTAVIGLLVATRYLI L 409 
+Y +++Q TL + +PL+VI VLT VDF ++WRYF +ANQ TAV+ L AT YL+ 

45 Sbjct: 357 EYFNMEQKTLRNRLLMAVPLFVIGAVLTQVDFGIIWRYFGFANQATAVMMLWTATAYLMR 416 

Query: 410 KRRNYWVTFVPAMFMLYAVWYIL-SQPIGFNMGLGILTYSLALVLTGIL 458 

+ +W+ VPA+FM + +IL S +GF ++IT+ L GL 
Sbjct: 417 HNKLHWICTVPALFMTTVCISFILNSSTLGFGLPMQISTIAGVLASLGAL 466 

50 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8623> and protein <SEQ ID 8624> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
55 McG: Discrim Score: 6.07 

GvH: Signal Score (-7.5): -3.54 

Possible site: 19 
>>> Seems to have an uncleavable N-term signal seq 
AL0M program count: 11 value: -11.25 threshold: 0.0 
60 INTEGRAL Likelihood =-11.25 Transmembrane 129 - 145 ( 122 - 157) 

INTEGRAL Likelihood = -9.92 Transmembrane 316 - 332 ( 305 - 342) 
INTEGRAL Likelihood = -6.42 Transmembrane 164 - 180 ( 157 - 181) 
INTEGRAL Likelihood = -5.57 Transmembrane 416 - 432 ( 414 - 435) 
INTEGRAL Likelihood = -4.88 Transmembrane 190 - 206 ( 183 - 209) 
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modified ALOM score: 2.75 



10 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5501 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01729(301 - 1668 of 2082) 

GP| 9655126 | gb | AAF93852 . 1 | |AE004154(1 - 464 of 494) carbon starvation protein A, putative 
20 {vibrio cholerae} 

%Match =29.9 

%Identity =47.6 %Similarity =68.6 

Matches = 218 Mismatches = 138 Conservative Sub.s = 96 

25 174 204 234 264 294 324 354 384 

TNEKLFIIKLRLFISKKQPFILKIGNFNFSMLY*SHENA**N*AKKFLGGSDMVTFLGGVALLIVGYFTYGRYIEKNFQI 

1= II II 1= III II ::|| I I 
MLWFLTCVAALVGGYFIYGAFVEKVFGI 
10 20 

30 

414 444 474 504 534 564 594 624 

DENRQTPAEALRDGYDWPMPKWKNGMIELLNIAGTGPIFGPILGALYGPVAYIWIVLGCIFAGAVHDYMIGMISLRNNG 

= 1 Mill II hill I ::>|||||| MIMMMMIM I :||| = IIIIIIIIIM Ihhll I 
NEKRQTPAHTKTDGVDYVPMSTPKVYLVQLIJSIIAGVGPIFGPIMGALYGPAAMLWIWGCIFAGAvHDYFSGMLSIRNGG 
35 40 50 60 70 80 90 100 

654 684 714 744 774 795 825 855 

AYLPELASRYLGKSMKHVINIFSMLLLILVATVFWTPANL1LSILPAGT - - - LSLPWI IGLI FVYYLI STVLPIDKALG 

I :| = Mil II :|||:: = l|:|| Ml II = I = = = I =1= = = Ml ll — MMMI M 
40 ASVPS ITGRYLGNGAKHFMNI FAIVLLLLVGWFVSAPAGMITNLINQQTDFTVSMTTMWI IFAYYILATIVP VDKI IG 

120 130 140 150 160 170 180 

894 924 954 984 1014 1044 1074 

KVYP VFCVI - LMVSTAAVGFRLLTGGFDMPNLTFETFKNMHPAGLGIFPALFFTI SCGAI SGFHATQAPMVSRT 

45 : || = I =11 = 1 = 111 = = == ||::| : = = 1111 I I M II I I I I I I I I = I = = M 

RFYPLFGALLIFMSVGLMTAIAFSSEHQVLGGFEISDM VKNLNPNDMPLWPALFITIACGAISGFHATQSPLMARC 

200 210 220 230 240 250 260 

1104 1134 1164 1191 1221 1251 1281 1311 

50 TVNEREGRFTFYGMMIAEGVIAMIWAGASMSLFKG-QNLYEMIAAGTPSAVVNQVMLMLLGSVIGTIAIIGVIVLPVSSG 

11= III III II 11=11=11 ==l=l = 1 I = I I II =111 III =ll==lh=ll 
MENEKNGRFVFYGAMIGEGIIALIWCWALSFFGSLFJU^SEAVKNGGPGNVVYGASFGLLGVFGGVIAFLGWILPITSG 
280 290 300 310 320 330 340 

55 1341 1371 1401 1431 1461 1491 1521 1551 

LSAFRSLRTIVADYIHVKQDTLPKIFAVTIPLYVISFVLTHVDFNLLTOYFNWANQVTAVIGLLVATRYLILKRRNYIOT 

=1111 I 1=1=1 ===l II = = =11=11 111=111 ==1111 =111 111= I II 11= = =1= 
DTAFRSSRLILAEYFN^QKTLRNRLLMAVPLFVIGAvLTQVDFGIIlTOYFGFANQATAVMMLWTATAYLNIRHNKLHWIC 
360 370 380 390 400 410 420 

60 

1581 1608 1638 1668 1698 1728 1758 1788 

FWAMFMLYAW\reiL-SQPIGFNMGLGILTYSLALVLTGIXVGLFWKSGQKQLKTVHPFAFLFNDHRPINYYSSLDS*Y 

|||:)| = =11 I =11 = = I I = I 1 
WPALFMTTVCISFIIMSSTLGFGLPMQISTIAGVLASLGALAYVAKA?SKGKGETDIiADEEKPOGVTKTA 
65 440 450 460 470 480 490 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 630 

A DNA sequence (GBSx0670) was identified in S.agalactiae <SEQ ID 1959> which encodes the amino 
acid sequence <SEQ ID 1960>. This protein is predicted to be lytR (lytT). Analysis of this protein sequence 
reveals the following: 
Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 27 - 43 ( 27 - 43) 



Final Results 

bacterial membrane Certainty=0. 1319 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB48183 GB:L42945 lytR [Staphylococcus aureus] 
Identities = 93/245 (37%) , Positives = 150/245 (60%) , Gaps = 3/245 (1%) 

20 Query: 1 MKVLVvDDEPVARNELIYLLNKYDSNLVIAEAHDMATALAILLRETFDVALLDIHLRDDS 60 

MK L++DDEP+ARNEL YLLN+ I EA ++ L LL +D+ LD++L D++ 

Sbjct: 1 MKALIIDDEPIARNELTYLIiNEIGGFEEINEAENVKETLEALLINQYDIIFLDVNLMDEN 60 

Query: 61 GLQLAEYINKMPKPPLLIFATAYDQYAIQAFEHDARDYLLKPYDFDRLKQAMDRVXGALS 120 
25 G++L I KM +PP +IFATA+DQYA+QAPE +A DY+LKP+ R++QA+++V+ + 

Sbjct: 61 GIELGAKIQKMKEPPAI I FATAHDQYAVQAPELNATDYILKPFGQKRIEQAVNKVRATKA 120 

Query: 121 TSTIIESVTSGPL FKQQYPLTVEDRIYLVSADDILLIEAMQGKLIIQTPDKNYEIDG 177 

S + + F Q P+ ++D+I+++ +1+ I G I T + YE 
30 Sbjct: 121 KDDNNASAIANDMSANFDQSLPVEIDDKIHMLRQQNIIGIGTHNGITTIHTTNHKYETTE 180 

Query: 178 SLQQWQDKLPSSQFWvHRSYIWINAIKTIEPWFNQTLQLHLCNKIWPVSRANVKPLK 237 

L +++ +L + F+R+HRSYI+N IK ++ WFN T +LN++VR++K K 
Sbjct: 181 PLNRYEKRLNPTYFIRIHRSYIINTKHIKEVQQWFNYTYMVILTNGVKMQVGRSFMKDFK 240 

35 

Query: 238 QMLGI 242 
+G+ 

Sbjct: 241 ASIGL 245 

40 There is also homology to SEQ ID 460. 

SEQ ID 1960 (GBS399) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 7; MW 30.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 2; MW 55kDa). Purified 
GBS399-GST is shown in Figure 217, lane 9; purified GBS399d-GST is shown in Figure 236, lane 3. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 631 

A DNA sequence (GBSx0671) was identified in S.agalactiae <SEQ ID 1961> which encodes the amino 

acid sequence <SEQ ID 1962>. Analysis of this protein sequence reveals the following: 

50 Possible site: 51 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.59 Transmembrane 95 - 111 ( 86 - 116) 
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INTEGRAL Likelihood = -5.95 Transmembrane 155 - 171 ( 152 - 175) 

INTEGRAL Likelihood = -2.28 Transmembrane 189 - 205 ( 187 - 206) 

INTEGRAL Likelihood = -1.49 Transmembrane 122 - 138 ( 121 - 138) 



5 Final Results 

bacterial membrane Certainty=0 .4036 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB48182 GB:L42945 lytS [Staphylococcus aureus] 
Identities = 264/570 (46%) , Positives = 389/570 (67%) , Gaps = 2/570 (0%) 

Query: 1 MTLFLIMMERAGLIILLAYAFVHIPFIKQTLKQPELKKHQYILLILFSLFAIISNFTGVE 60 
15 ++L ++++ER GLII+LAY ++IP+ K + + K ++ L I+FSLFA++SN TG+ 

Sbjct: 2 LSLTMLLLERVGLIIILAYVLMNIPYFKNLMNRRRTWKARWQLCIIFSLFALMSNLTGIV 61 

Query: 61 IQSDLSIIPQTLNHIADQSSVANTRVLTIGVSGLIGGPIVGIIVGLLSVFVRYLQGGLAP 120 
I S+ + D S+ANTRVLTIGV+GL+GGP VG+ VG++S R GG 

20 Sbjct: 62 IDHQHSLSGSVYFRLDDDVSLANTRVLTIGVAGLVGGPFVGLFVGVISGIFRVYMGGADA 121 

Query: 121 HIYVISSLLIGLCSGLSGNYLRKNYNKIRVLDAMWGFGMEILQMICILIFSVDFNQALR 180 

+Y+ISS+ IG+ +G G ++ + + ++G ME++QM+ IL FS D A+ 

Sbjct: 122 QVYLISSIFIGIIAGYFGLQAQRRKRYPSIAKSAMIGIVMEMIQMLSILTFSHDKAYAVD 181 

25 

Query: 181 LVS FI SMPMI LSNTLGLGI F I S 1 I SSTQKLEEHAKAFQTHQVLELANLTLPYLRKGLTTE 240 

L+S I++PMI+ N++G IF+SII T K E+ K QTH VL+L N T PY ++GL E 
Sbjct: 182 LISLIALPMIIVNSVGPAIFMSIIIPTLKQEDQMKPVQTHDVLQLMNQTFPYFKEGLNRE 241 

30 Query: 241 SCQPVAEIIHKHMDVSAVSLTSQSAIIAYVGK3ADHHLP 300 

S Q +A II M VS+V++TS++ IL++VG G+DHH+P +ILT L+K + +GK+ 
Sbjct: 242 SAQQIAMIIKNLMKVSSVAITSKNEILSHVGGGSDHHIPTNEILTSLSKDVLKSGKLKEV 301 

Query: 301 TDKSEIECDHKNCPLSSAIVIPLHIHDVIVGTLKLYFSDAQHMTYVDRQLAEGLGNIFST 360 
35 K EI C H NCPL +AIVIPL +H IVGTLK+YF++ +T+V+RQLAEGL NIFS+ 

Sbjct: 302 HTKEEIGCSHPNCPLRAAIVIPLEMHGSIVGTLKMYFTNPNDLTFVERQLAEGLANIFSS 361 

Query: 361 QLALGQAEFATRLLQDAEMKSLQAQVNPHFLraAI^IYGLIRMDSEKARKLVQDFSKVI 420 
Q+ LG+AE ++LL+DAE+KSLQAQV+PHF FN++N I L+R++SEKAR+L+ + S 
40 Sbjct: 362 QIELGEAETQSKLLKDAEIKSLQAQVSPHFFFNSINPISALVRINSEKARELLLELSYFF 421 

Query: 421 RANLQRAKQNLIPLHDELEQVNAYLALEEARFPNMVAFNLDNQTNSDDNLMIPPFTLQVL 480 

RANLQ 4-KQ+ I L EL QV AYL+LE+AR+P N++ + D +++PPF +Q+L 

Sbjct: 422 RANLQGSKQHTITLDKELSQVRAYLSLEQARYPGRFNININVEDKYRD-VLVPPFLIQIL 480 

45 

Query: 481 IENSYKHAFKmWKNNQLKVTIARNN-DRLHIIVQDNGIGIPKEKLITLGKKTQISKQGS 539 

+EN+ KHAF + + N + V++ + + IIVQDNG GI K+K+ LG+ + S+ G+ 

Sbjct: 481 VENAIKHAFTNRKQGNDIDVSVIKETATHVRIIVQDNGQGISKDKMHLLGETSVESESGT 540 

50 Query: 540 GTAIENLVRRLNI IYDGQASLKFESNDSGT 569 

G+A+ENL RL ++ A+L+FES SGT 
Sbjct: 541 GSALENLNLRLKGLFGKSAALQFESTSSGT 570 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1963> which encodes the amino acid 
55 sequence <SEQ ID 1964>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 283 - 299 ( 276 - 307) 
60 INTEGRAL Likelihood = -5.57 Transmembrane 27 - 43 ( 24 - 48) 

Final Results 

bacterial membrane Certainty=0 .3718 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

65 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB54576 GB:AJ006396 histidine kinase [Streptococcus pneumoniae] 
Identities = 115/231 (49%) , Positives = 159/231 (68%) , Gaps = 7/231 (3%) 

5 

Query: 351 MIASIKAyiDEVYvLEVEQRDAQMRALQSQIOTHFLYOTLEyiRMYALSCQQEELRDVIY 410 

ML ++ I ++Y LE+ Q+DA MRALQ+QINPHF+YNTLE++RMYA+ Q+ELAD+IY 
Sbjct: 1 MLDRLEKNIHDIYQLELSQKDANMRALQAQINPHFMYNTLEFLRMYAVMQSQDELADIIY 60 

10 Query: 411 AFASLLRNNISQDKMTTLKEELAFCEKYIYLYQMRYPDSFAYHVKIDESVADLAIPKFVI 470 

F+SLLRNNIS ++ T LK+EL FC KY YL +RYP S AY KID + ++ IPKF + 
Sbjct: 61 EFSSLLRNNISDERETLLKQELEFCRKYSYLCMVRYPKSIAYGFKIDPELENMKIPKFTL 120 

Query: 471 QPLVENYFVHGIDYSRHDNALSIKALDETDHLLIQVLDNGRGISQERLADMEKRLQ 526 

15 QPLVENYF HG+D+ R DN +SIKAL + +1 V+DNGRG+S E+LA++ ++L 

Sbjct: 121 QPLVENYFAHGVDHRRTDNVISIKALKQDGFVEILVVDNGRGMSAEKIANIREKLSQRYF 180 

Query: 527 EHQTT GNSSIGLQNWLRLFHHFRDRVSWSMAKEPNGGFIIQIRIRKD 574 

EHQ + SIG+ NV+ R +F DR + ++ G +1 1+ + 

20 Sbjct: 181 EHQASYSDQRQSIGIVNVHERFVLYFGDRYAITIESAEQAGVQYRITIQDE 231 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 59/180 (32%) , Positives = 97/180 (53%) , Gaps = 8/180 (4%) 

25 Query: 375 QDAEMKSLQAQVNPHFLFNALNTI - - YGLIRMDSEKARKLVQDFSKVIRANLQRAKQNLI 432 

+DA+M++LQ+Q+NPHFL+N L I Y L E A ++ F+ ++R N+ + K + 

Sbjct: 370 RDAQMRALQSQINPHFLYNTLEYIRMYALSCQQEELA-DVIYAFASLLRNNISQDK--MT 426 

Query: 433 PLHDELEQVNAYLALEEARFPNMVAFNIiD^^ 492 
30 L +EL Y+ L + R+P+ A+++ + D L IP F +Q L+EN + H + 

Sbjct: 427 TLKEELAFCEKYIYLYQMRYPDSFAYHVKIDESVAD-IAIPKFVIQPLVENYFVHGIDYS 485 

Query: 493 NK3slNQLK\rriARHNDRLHIIVQDNGIGIPKEKLITLGKKTQISKQ- -GSGTAIENLVRRL 550 
+N L + D L I V DNG GI +E+L + K+ Q + S ++N+ RL 

35 Sbjct: 486 RHDNALSIKALDETDHDLIQVLDNGRGISQERLADMEKRLQEHQTTGNSSIGLQNVYLRL 545 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 632 

40 A DNA sequence (GBSx0672) was identified in S.agalactiae <SEQ ID 1965> which encodes the amino 
acid sequence <SEQ ID 1966>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» May be a lipoprotein 

45 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID 9827> which encodes amino acid sequence <SEQ ID 9828> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 
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Example 633 

A DNA sequence (GBSx0673) was identified in S.agalactiae <SEQ ID 1967> which encodes the amino 
acid sequence <SEQ ID 1968>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N- terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 .4821 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8625> and protein <SEQ ID 8626> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -8.54 
GvH: Signal Score (-7.5): -5.6 

Possible site: 57 
>>> Seems to have no N- terminal signal sequence 
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modified ALOM score: 2.41 
*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 .4821 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 634 

A DNA sequence (GBSx0674) was identified in S.agalactiae <SEQ ID 1969> which encodes the amino 
acid sequence <SEQ ID 1970>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 83 - 99 ( 83 - 99) 



Final Results 

bacterial membrane Certainty=0. 1213 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 635 

A DNA sequence (GBSx0675) was identified in S.agalactiae <SEQ ID 1971> which encodes the amino 
acid sequence <SEQ ID 1972>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1902 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 636 

A DNA sequence (GBSx0676) was identified in S.agalactiae <SEQ ID 1973> which encodes the amino 

acid sequence <SEQ ID 1974>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4763 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 637 

A DNA sequence (GBSx0677) was identified in S.agalactiae <SEQ ID 1975> which encodes the amino 
acid sequence <SEQ ID 1976>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 5089 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 638 

A DNA sequence (GBSx0678) was identified in S.agalactiae <SEQ ID 1977> which encodes the amino 
acid sequence <SEQ ID 1978>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1978 (GBS184) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 7; MW 21kDa), in Figure 168 (lane 14-16; MW 36kDa - thioredoxin 
20 fusion) and in Figure 238 (lane 9; MW 36kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 7; MW 46.4kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 639 

25 A DNA sequence (GBSx0679) was identified in S.agalactiae <SEQ ID 1979> which encodes the amino 
acid sequence <SEQ ID 1980>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 2179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 640 

40 A DNA sequence (GBSx0680) was identified in S.agalactiae <SEQ ID 1981> which encodes the amino 
acid sequence <SEQ ID 1982>. This protein is predicted to be immunogenic secreted protein precursor. 
Analysis of this protein sequence reveals the following: 
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Possible site: 34 

>» Seems to have no N-terminal signal sequence 



50 



Final Results 

5 bacterial cytoplasm Certainty=0. 2166 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9351> which encodes amino acid sequence <SEQ ID 9352> 
10 was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1983> which encodes the amino acid 
sequence <SEQ ID 1984>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
>>> Seems to have an uncleavable N-term signal seq 
15 INTEGRAL Likelihood = -3.77 Transmembrane 9 - 25 ( 5-27) 

Final Results 

bacterial membrane — Certainty=0 . 2508 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/86 (74%) , Positives = 76/86 (87%) 

25 Query: 1 MGNGGDWKNKPGYQTTHEAKTGYAI S FS PGQAGADRTYGHVAI VEDVKEDGS I PI SESNV 60 

MGNGGDW+ KPG+ TTH+ K GY +SF+PGQAGAD TYGHVA+VE +KEDGSI ISESNV 
Sbjct: 452 MGNGGDWQRKPGFOTTHKPKVGYWSFAPGOJVGATATYGIT^vVEQIKEDGSILISESNV' 511 

Query: 61 LGLGTISYRTFSAAEAAQLTYWGEK 86 
30 +GLGTISYRTF+A +A+ LTYWG+K 

Sbjct: 512 MGLGTISYRTFTAEQASLLTYWGDK 537 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 641 

A DNA sequence (GBSx0681) was identified in S.agalactiae <SEQ ID 1985> which encodes the amino 
acid sequence <SEQ ID 1986>. This protein is predicted to be immunogenic secreted protein precursor. 
Analysis of this protein sequence reveals the following: 

Possible site: 40 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2495 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



AAB52379 GB:U31811 immunogenic secreted protein precursor [Streptococcus pyogenes] 
Identities = 133/259 (51%) , Positives = 170/259 (65%) , Gaps = 4/259 (1%) 

Query: 3 PSQPQWATPQKSEVOTPAITSGIDLPDVAIPTAMASAAYV1CHWIGNDAYTHNLLSHRYG 62 

PQP+A +VP SDL+ P++ +SAAYV+HW G+ AYTHNLLS RYG 
Sbjct: 174 PIQPPLGAA APVFAPWRESDKDLSKLK-PSSRSSAAYVRHWTGDSAYTHNLLSRRYG 229 

55 Query: 63 ITAAQLDGFLQSTGITYDSSRIDGQKILDREKSSGLDARAIIAIAIAESSLGTQGVATAP 122 

ITA QLDGFL S GI YD R++G+++L+ EK +GLD RAI+AIA+AESSLGTQGVA 
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Sb j ct : 


230 


1TAEQLDGFIJSISLG1HYDKERIMGKRLLEWEKLTGLDVRAIVAIAMAESSLGTQGVAKEK 


289 


Query: 


123 


GAtmFGFGAVDlWrrTNAQNFSDDKAVIKOTQETIIQNQNTSFAIQDQKAQFLSTGNLIsWA 


182 






fla."KnVITrftla.PZ\ Tl M Maj. j-QD-i- ftj- M xxTTT "Kfa-'NT J.T? CiVt KA+ ^ d T.4- 
0"rl\l*lr btuH i-J IN i.N.fVr tqUt rlsr l v l ttIII JN-f*iN tr \d 1 - > i^"T O \J JJT 




Sb j ct : 


290 


GSNMFGYGAFDFNPlMAKKySDEVAIRHMVEDTIIANKNQTFERQDLKAKKWSLGQIiDTL 


349 


Query: 


183 


ARGGOTFTDASGSGKRRAAIMESIDKWIDAHGGISEISKELLNTSSVAMMAVPTSYSVSR 


242 






GGVYFTD SGSG+RRA IM +D+WID HG +1 + L TS VP Y S+ 




Sbjct: 


350 


IDGGVYFTDTSGSGQRRADIMTKLDQWIDDHGNTPDIPEHLKITSGTQFSEVPVGYKRSQ 


409 


Query: 


243 


ANQAGNYVAGTYPWGQRTW 261 








Y + TY +GQ TW 




Sb j ct : 


410 


PQNVLTYKSETYS FGQCTW 428 





A related DNA sequence was identified in S.pyogenes <SEQ ID 1987> which encodes the amino acid 
sequence <SEQ ID 1988>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succ> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 143/265 (53%) , Positives = 184/265 (68%) , Gaps = 5/265 (1%) 



Query: 


2 


VPSQPQVTATPQKSEWTPA ITSGIDLPDVAI PTAMASAAYVKHWIGNDAYTHNL 


56 






V+P++QETP S +DL ++ IP+ AAYV+HW G +AYTH+L 




Sbjct: 


135 


VDTAPASSLSKQLPEARTPIQSLSPYVSDLDLSEIDIPSVNTYAa.YVEHWSGKNAYTHHL 


194 


Query: 


57 


LSHRYGITAAQLDGFLQSTGITYDSSRIDGQKILDREKSSGLDARAIIAIAIAESSLGTQ 


116 






LS RYGI A Q+D +L+STGI YDS+RI+G+K+L EK SGLD RAI +AIA++ESSLGTQ 




Sbjct: 


195 


LSRRYGIKADQIDSYLKSTGIAYDSTRINGEKLLQWEKKSGLDVRAIVAIAMSESSLGTQ 


254 


Query: 


117 


GVATAPGANMFGFGAVDNNTTNAQNFSDDKAVIKMTQETIIQNQNTSFAIQDQKAQFLST 


176 






G+AT GANMFG+ A D + T A F+DD A++KMTQ+TII+N+N++FA+QD KA S 




Sb j ct : 


255 


GIATLLGANMFGYAAFDLDPTQASKFNDDSAIVKMTQDTIIKNKNSNFALQDLKAAKFSR 


314 


Query: 


177 


GNLNVAARGGVYFTDASGSGKRRAAIMES IDKWI DAHGG I SE IS KELLNTSS VAMMAVPT 


236 






G LN A+ GGVYFTD +GSGKRRA IME +DKWID HGG I EL SS + +VP 




Sb j ct : 


315 


GQLNFASDGGVYFTDTTGSGKRRAQIMEDLDKWIDDHGGTPAIPAELKVQSSASFASVPA 


374 


Query: 


237 


SYSVSRANQAGNYVAGTYPWGQRTW 261 








Y +S++ Y A +Y WGQ TW 




Sb j ct : 


375 


GYKLSKSYDVLGYQASSYAWGQCTW 399 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 642 

A DNA sequence (GBSx0682) was identified in S.agalactiae <SEQ ID 1989> which encodes the amino 
acid sequence <SEQ ID 1990>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8627> and protein <SEQ ID 8628> were also identified. Analysis of this 
protein sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 4 

McG: Discrim Score: 11.56 
GvH: Signal Score (-7.5): 0.870001 

Possible site: 27 
>>> Seems to have a cleavable N-term signal seq. 
10 7ALOM program count: 0 value: 11.88 threshold: 0.0 

PERIPHERAL Likelihood = 11.88 63 
modified ALOM score: -2.88 

*** Reasoning Step: 3 

15 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

SEQ ID 8628 (GBS 159) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 4; MW 26kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 34 (lane 2; MW 41kDa). 

GBS159-GST was purified as shown in Figure 198, lane 9. 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 643 

A DNA sequence (GBSx0683) was identified in S.agalactiae <SEQ ID 1991> which encodes the amino 

acid sequence <SEQ ID 1992>. Analysis of this protein sequence reveals the following: 

30 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2668 (Affirmative) < suco 

35 bacterial membrane. Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04699 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
40 Identities = 32/76 (42%) , Positives = 54/76 (70%) 

Query: 7 LGSVIELKNDSQKVMITSRFPLYDNEGQLGYFDYSGCIFPISIVGNETYFFNLEDIDKVL 66 

+GS++ LK + K+MI +R P+ + G+ FDYSGC +P +V ++ ++FN E+ID+V+ 
Sbjct: 4 IGSIVYLKEGTSKLMILNRGPILEANGENKMFDYSGCFYPQGLVPDKVFYFNHENIDEVV 63 

45 

Query: 67 FEGYYDENEEEMQKIF 82 

FEG+ D+ E+ QK+F 
Sbjct: 64 FEGFQDDEEQRFQKLF 79 

50 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 644 

A DNA sequence (GBSx0684) was identified in S.agalactiae <SEQ ID 1993> which encodes the amino 
acid sequence <SEQ ID 1994>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.81 Transmembrane 75 - 91 ( 69 - 99) 

INTEGRAL Likelihood =-14.38 Transmembrane 134 - 150 ( 129 - 179) 

INTEGRAL Likelihood = -8.49 Transmembrane 157 - 173 ( 151 - 179) 

INTEGRAL Likelihood = -1.17 Transmembrane 50 - 66 ( 46 - 67) 



Final Results 

bacterial membrane Certainty=0 . 6922 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

20 Example 645 

A DNA sequence (GBSx0685) was identified in S.agalactiae <SEQ ID 1995> which encodes the amino 
acid sequence <SEQ ID 1996>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -0.11 Transmembrane 40 - 56 ( 40 - 56) 

Pinal Results 

bacterial membrane Certainty=0. 1044 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1996 (GBS204) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
35 extract is shown in Figure 49 (lane 13; MW 32kDa) and Figure 53 (lane 2; MW 14.7kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 54 
(lane 6; MW 39.7kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 646 

A DNA sequence (GBSx0686) was identified in S.agalactiae <SEQ ID 1997> which encodes the amino 
acid sequence <SEQ ID 1998>. Analysis of this protein sequence reveals the following: 



possible site: 38 

>» Seems to have no N-terminal signal sequence (or signal = aa 1-26) 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC16670 GB:AJ302698 hypothetical protein [Staphylococcus 
5 haemolyticus] 

Identities = 60/254 (23%) , Positives = 109/254 (42%) , Gaps = 14/254 (5%) 

Query: 2 VKVSVSSVGTQASTVAISMFSRVSALNDAITKLSSFAEAATLQGTAYSNAKSYATGTLTP 61 
+ + V +Q+S V ++ S S + + F A+ LOG AY + K + + + P 

10 Sbjct: 3 IDMYVGKSKSQSSDVGSTVKSISSGYDSLQKGIMQFVGASELQGQAYDSGKQFFSAVIAP 62 

Query: 62 MLQGMILFSETLSEKCTELQTLYVSICGDEDIJDSvVLESKLASDRASLKIAFALLEHLND 121 

+ + + E+C+ YS +L L + + EA+ L 

Sbjct: 63 LTESIKTLGELTEQACNDFVDQYQSEVDSQSLKESELLEDIEELNKQISQLEAMNASIiKH 122 

15 

Query: 122 DPEPSKSAISSTKSNIKKLKKRIKSNQKXLDNI^FNAHSATOFADISNAQSTVNQALAA 181 

+ S +S I L+++ K ++KL L +F+A S +F ++ + Q TV Q + 

Sbj ct : 123 KSSKNSSLLSGNHQMISSLEQQKKELEEKLRKLRQFDAKSPNIFKEVESFQKTVQQGINQ 182 

20 Query: 182 VSTGFSGYNSKTGAFGKPTSGQMEWTKTVKKNWKEREDAKAEELKSKKAEESKKASKIEN 241 
T ++ F P MEW K ++ E K +++ ++KA++ KK SK + 
Sbjct: 183 AKT AWDPGKQTFNIPAGKDMEWAKVSQQKALE VKMDKI -NQKAKDGKKLSKNDI 235 

Query: 242 TT KKSNV 248 

25 T KKSN+ 

Sbjct: 236 FTI IAYQQQKKSNI 249 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1998 (GBS270) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
30 extract is shown in Figure 51 (lane 2; MW 34.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 7; MW 59.2kDa). 

The GBS270-GST fusion product was purified (Figure 206, lane 3) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 265), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 647 

A DNA sequence (GBSx0687) was identified in S.agalactiae <SEQ ID 1999> which encodes the amino 
acid sequence <SEQ ID 2000>. This protein is predicted to be outer surface protein F. Analysis of this 
40 protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 .3323 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
50 No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 2000 (GBS316) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 3; MW 23kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 2; MW 41.8kDa). 

GBS316-GST was purified as shown in Figure 206, lane 4. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 648 

A DNA sequence (GBSx0688) was identified in S.agalactiae <SEQ ID 2001> which encodes the amino 
acid sequence <SEQ ID 2002>. This protein is predicted to be actin-like protein arp3 (act4). Analysis of this 
10 protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 0217 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 649 

A DNA sequence (GBSx0689) was identified in S.agalactiae <SEQ ID 2003> which encodes the amino 
25 acid sequence <SEQ ID 2004>. This protein is predicted to be diarrheal toxin. Analysis of this protein 
sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.65 Transmembrane 65 - 81 ( 61 - 84) 
30 INTEGRAL Likelihood = -3.98 Transmembrane 89 - 105 ( 85 - 106) 

Final Results 

bacterial membrane Certainty=0 .4461 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15175 GB:Z99120 alternate gene name: yueA-similar to 
hypothetical proteins [Bacillus subtilis] 
40 Identities = 452/1058 (42%) , Positives = 664/1058 (62%) , Gaps = 39/1058 (3%) 



45 



Query: 98 WMIFSITGYFKNRKQYKQDLQERIDSYHDYLSDKSIELQKLAKEQKRGQHYHYPTIEGL 157 

+T+I S YF+++ Q K+ ++R Y YL +K ELQ LA++QK+ +H+P+ E + 

Sbjct: 1 MTLITSTVQYFRDKNQRKKREEKRERWKLYUJNKRKELOAIjAEKQRQvLEFHFPSFEQM 60 

Query: 158 QEMADTYHHRIYEKTPLHFDFLYYRLGLGEVPTSYNIHYSQPERSGKK-DPLENEGYNLY 216 

+ + RI+EK+ D+L RLG G VP+SY I+S + + + DL + ++ 

Sbjct: 61 KYLTSEISDRIWEKSLESKDYLQLRLGTGTVPSSYEINMSGGDLANRDIDDLMEKSQHMQ 120 

50 Query: 217 FNNRYIKNMPIVANLSHGPVGYIGPRGLVLEQLQLMVNQLAFFHSYHDVQFITIVPEEEM 276 
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+ I+N P+ +L+ GP+G +G +V ++ ++ QL+FF+SYHD++F+ I EEE 
Sbjct: 121 RVYKDIRNAPVTVDLAEGPMGLVGKSQIVKNEIHQLIGQLSFFNSYHDLRFVFIFHEEEY 180 

Query: 277 DKWSWMRWLPHETLQDVNWGFVYNQRSRDQVLNSI^ 336 
5 W WM+ +P + + +GF+YN+++RDQ+L+SL ++++ +R+ + KE F P 

Sbjct: 181 KDWEWMKCVPQFQMPHIYAKGFIYNEQTRDQLLSSLYELIR ERDLEDDKEKLQFKP 236 

Query: 337 HYWIVTDEKLILDHVIMEFFTEDPTELGCSLIFVQDVMSSLSENIKTIINIKDRNTGQL 396 
H+V ++T+++LI +HVI+E+ LG S I + SLSENI T++ + + G + 

10 Sbjct: 237 HFVFVITNQQLISEHVILEYLEGQHEHLGISTIVAAETKESLSENITTLVRYINEHEGDI 296 

Query: 397 VIEEGELKETDFELDHFLEDYDKENISRRLAPIiNHLQNLKSSIPEAVTFMEMYQAEEFED 456 

+I++ + F LDH + D E SR L LNH + +SIPE V+F+E++ A+E ++ 

Sbjct: 297 LIQKKKAVRIPFRLDHHQRE-DNERFSRTLRTLNHQVGITNSIPETVSFLELFHAKEVKE 355 

15 

Query: 457 LHVQERWISHAPYKSSAVPLGLRGQDDIVYLNLHEKAHGPHGLVAGTTGSGKSEIIQSYI 516 

+ +Q+RW++ KS +VP+G +G+DDIVYLKLHEKAHGPHGL+AGTTGSGKSE +Q+YI 
Sbjct: 356 IGIQQRWLTSESSKSLSVPIGYKGKDDIVYLNLHEKAHGPHGLLAGTTGSGKSEFLQTYI 415 

20 Query: 517 LSLAVNFHPHDVAFLLIDYKGGGMANIiFKDLPHLLGTITNLDGAQ- -SMRALVSINAELK 574 

LSLAV+ FHPH+ AFLLIDYKGGGMA F+++PHLLGTITN++G++ SMRAL SI +ELK 
Sbjct: 416 LSLAVHFHPHEAAFLLIDYKGGGMAQPFRNIPHLLGTITNIEGSKNFSMRAIASIKSELK 475 

Query: 575 RRQRLFAKADVNHINQYQKKYKLGEVSEPMPHLFLISDEFAELKSNQPEFMKELVSTARI 634 
25 +RQRLF + VNHIN Y K YK G+ MPHLFLISDEFAELKS +P+F++ELVS ARI 

Sbjct: 476 KRQRLFDQYQVNHI1TOYTKLYKQGKAEVAMPHLFLISDEFAELKSEEPDFIRELVSAARI 535 

Query: 635 GRSLGIHLIIATQKPSGVVDDQIWSNSRFKLALKVADRGDSMEMLHTPDAAEITQAGRAY 694 
GRSLG+HLILATQKP G++DDQIWSNSRFK+ALKV D DS E+L DAA IT GR Y 
30 Sbjct: 536 GRSLGVHLILATQKPGGIIDDQIWSNSRFKVALKVQDATDSKEILKNSDAANITVTGRGY 595 

Query: 695 LQVGNNEVYELFQSAWSGADYQPEKDDQGIEDHTIYSINDK3QYEILNDDLSGLDQAENI 754 

LQVGNNEVYELFQSAWSGA YE G ED I + D G LS +D +N 

Sbjct: 596 LQVGNNEVYELFQSAWSGAPYLEEV- - YGTEDE - IAIVTDTGLI PLSEVDTEDNA 647 

35 

Query: 755 -KEVPTELDAIVENIQALTKEMGISDLPQPWLPPLSNQIAVTDLRKEGSVDLWSKAPSYK 813 

K+V TE++A+V+ 1+ + EMGI LP PWLPPL+ +1 T h+ 
Sbjct: 648 KKDVQTEIEAVVDEIERIQDEMGIEKLPSPWLPPLAERIPRT LFPSNEKDH 698 

40 Query: 814 AVLGFMDI PSQQAQEVAYHDFEDDGHLS I FAGPSMGKSTALQTVTMDLARHNSPEFLNLY 873 

++D P Q Q + +DG++ IF GKS A T M A +PE L++Y 

Sbjct: 699 FHFAYVDEPDLQRQAPIAYKMMEDGNIGIFGSSGYGKSIAAATFLMSFADVYTPEELHVY 758 

Query: 874 LFDFGTNGLLPLRRLPHVADFFTIDDDEKIAKFIARIKVEMSDRKKALSRYNVATAKLYR 933 
45 +FDFG LLPL +LPH AD+F +D KI KF+ RIK E+ RK+ ++ K+Y 

Sbjct: 759 IFDFGNGTLLPLAKLPHTADYFLMDQSRKIEKFMIRIKEEIDRRKRLFREKEISHIKMYN 818 

Query: 934 QVSGETMPQILIVIDSYEGLREAQTPTNLEACFQNISRDGSSLGISLVISAGRTAALRSS 993 
+S E +P I I ID+++ +++ LE+ F +SRDG SLGI +++A R A+R S 

50 Sbjct: 819 ALSEEELPFIFITIDNFDIVKDEM--HELESEFVQLSRDGQSLGIYFMLTATRVNAVRQS 876 

Query: 994 LMANLKERIALKLTDDSESRTLVGRHQHIMEDIPGRGLIKRDDIEVLQVALSTEGTETFD 1053 

L+ NLK +1 L D SE ++ GR + +E IPGR +I+++++ Q+ L + + 
Sbjct: 877 LLNNLKTKIVHYLMDQSEGYSIYGRPKFNLEPIPGRVIIQKEELYFAQMFLPVDADDDIG 936 

55 

Query: 1054 I INNIQNESDAMNSKWTG- PRPKAI PIVPEELTFDDFMATDSVQADLSANRL- - PLGLEM 1110 

+ N ++++ + ++ +P IP++PE L+ + S++ L L P+GL 

Sbjct: 937 MFNELKSDVQKLQGRFASMEQPAP I PMLPESLSTREL SIRFKLERKPLSVPIGLHE 992 

60 Query: 1111 VDVESYSIALKRFKHMLYMSDSDESLEAVGSHIIKVLL 1148 

V L + KH L + + ++++KV+L 
Sbjct: 993 ETVSPVYFDLGKHKHCLILGQTQRG KTOVLKVML 1026 

There is also homology to SEQ ID 24. 
65 A related GBS gene <SEQ ID 8629> and protein <SEQ ID 8630> were also identified. Analysis of this 



protein sequence reveals the following: 
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Homology to a bacterial toxin 

The protein has homology with the following sequences in the databases: 

>OMNI|NT01BS3725 diarrheal toxin 

5 Score = 203 bits (511) , Expect = 4e-51 

Identities = 123/377 (32%) , Positives = 198/377 (51%) , Gaps = 22/377 (5%) 

Query: 1 MGISDLPQPWLPPLSNQIAVTDLRKEGSVDLWSKAPSYKAVLGFMDIPSQQAQEVAYHDF 60 
MGI LP PWLPPL+ +1 T L+ ++D P Q Q + 

10 Sbjct: 704 MGIEKLPSPWLPPLAERIPRT LFPSNEKDHFHFAYVDEPDLQRQAPIAYKM 754 

Query: 61 EDDGHLSIFAGPSMGKSTALQTVTMDLARHNSPEFLNLYLFDFGTNGLLPLRRLPHVADF 120 

+DG++ IF GKS A T M A +PE L++Y+FDFG LLPL +LPH AD+ 

Sbjct: 755 MEDC3NIGIFGSSGYGKSIAAATFLMSFADVYTPEELHVYIFDFGNGTLLPLAKLPHTADY 814 

15 

Query: 121 FTIDDDEKIAKFIARIKATEMSDRKKALSRYNVATAKLYRQVSGETMPQILIVIDSYEGLR 180 

F +D KI KF+ RIK E+ RK+ ++ K+Y +S E +P I I ID+++ ++ 

Sbjct: 815 FLMDQSRKIEKFMIRIKEEIDRRKRLFREKEISHIKMYNALSEEELPFIFITIDNFDIVK 874 

20 Query: 181 EAQTPTNLEACFQNISRDGSSLGISLVISAGRTAALRSSLMANLKERIALKLTDDSESRT 240 

+ LE+ F +SRDG SLGI +++A R A+R SL+ NLK +1 L D SE + 

Sbjct: 875 DEM--HELESEFVQLSRDGQSLGIYFMLTATRVNAVRQSLLNNLKTKIVHYLMDQSEGYS 932 

Query: 241 LVGRHQHIMEDIPGRGLIKRDDIEVLQVALSTEGTETFDIINNIQNESDAMNSKWTG-PR 299 
25 + GR + +E IPGR +I+++++ Q+ L + + + M ++++ + ++ + 

Sbjct: 933 IYGRPKFNLEPIPGRVIIQKEELYFAQMFLPVDADDDIGMFNELKSDVQKLQGRFASMEQ 992 

Query: 300 PKAI PIVPEELTFDDFMATDSVQADLSANRL- - PLGLEMVD VESYSLALNRFKHMLYMSD 357 
P IP++PE L+ + S++ L L P+GL V L + KH h + 

30 Sbjct: 993 PAPIPMLPESLSTREL SIRPKLERKPLSVPIGLHEETVSPVYFDLGKHKHCLILGQ 1048 

Query: 358 SDESLEAVGSHIIKVLL 374 

+ ++++KV+L 
Sbjct: 1049 TQRG KTNVLKVML 1061 

35 

SEQ ID 8630 (GBS326) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 65 (lane 5; MW 66kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 71 (lane 5; MW 91kDa). 

GBS326-GST was purified as shown in Figure 212, lane 5. 

40 GBS326LN was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 127 (lane 2-4; MW 114kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 184 (lane 6; MW 114kDa). The purified protein 
is shown in Figure 236, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 650 

A DNA sequence (GBSx0690) was identified in S.agalactiae <SEQ ID 2005> which encodes the amino 
acid sequence <SEQ ID 2006>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2693 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 651 

A DNA sequence (GBSx0691) was identified in S.agalactiae <SEQ ID 2007> which encodes the amino 
acid sequence <SEQ ID 2008>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3933 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 652 

A DNA sequence (GBSx0692) was identified in S.agalactiae <SEQ ID 2009> which encodes the amino 
acid sequence <SEQ ID 2010>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
25 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.32 Transmembrane 225 - 241 ( 219 - 246) 

Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04693 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
35 Identities = 83/320 (25%) , Positives = 162/320 (49%) , Gaps = 1/320 (0%) 



40 



Query: 103 WFILHPSNLFLTKNATAKIAYRSLPGIMRPEKFGPEEFLYQFKCFVFALLTQHDYIELY 162 

++ 1+ P N+ ++ + + + P + PE + + + LL + Y 

Sbjct: 106 LHLIVSPENVLVSDGLDVTFIHYGVKDSIPPYETDPERLFLELRATLLVLLDGNHRFHEY 165 

Query: 163 NGAISVIEVSDFLKSIYHAETIQAvRDIITIDYEQQVEVETHTLAKVSRAKYKLYKYISV 222 

+++S KS+ T++ +R++I + Q+ E + L KV + K+ + K+ + 
Sbjct: 166 MNYHDTLKLSPEAKSLVQQTTLEGLRELIR-HWIQEHEQQEKQLHKVPKTKWTIQKWAGI 224 

45 Query: 223 WLGALSTILLIPLWLVFIHNPFKEKMIiAADTSFIKVDYNQVINRLEHVKVSKLPYTQKY 282 

LA +1 +VY++ P +E A+ +++ +Y+QVI+ LE + +P KY 

Sbjct: 225 GLIARLVPAIIYIVYVIAFLQPRQE^FTASHAAYLNFjNYSQVIDTLEPYSPNSMPRVVKY 284 

Query: 283 EliAYSYINGMSFSEEQREVIIjNNVTLKTDELYLDYWINIGRGLDDDAIDAAKRLDDSDLV 342 
50 +LA SY+ RE + N + L+ E Y DYWI IGRG ++ AID A+ L D + + 

Sbjct: 285 QLAQSYVAIEPLQAYHRENLKNVLVLQAAESYFDYWIAIGRGENEKAIDIARGLQDKEWL 344 
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Query: 343 IYAIVQKMDQTOKDNSLSGKDREQK1SELQTDYDKXWKDRKTALTDEESKSKNSNNHSTN 402 

+YA V++ ++V+ D +LSGK+RE + E++ + D Y ++ + + E+ N+ ++N 
Sbjct: 345 VYANVKRREEVKSDENLSGKEREDLIKEIEAEIDDYMRELEELAEEGEAFQPNAEPAASN 404 

5 Query: 403 SNKESSESSSTTASTSSKTK 422 

+E + S + + K 
Sbjct: 405 ELEEDEGDTEEDDSDNQEAK 424 

No corresponding DNA sequence was identified in S.pyogenes. 

10 SEQ ID 2010 (GBS337) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 3; MW 50.3kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 653 

15 A DNA sequence (GBSx0693) was identified in S.agalactiae <SEQ ID 201 1> which encodes the amino 
acid sequence <SEQ ID 2012>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-14.01 Transmembrane 131 - 147 ( 122 - 153) 

20 

Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

25 

A related GBS nucleic acid sequence <SEQ ID 863 1> which encodes amino acid sequence <SEQ ID 8632> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 13.38 
30 , GvH: Signal Score (-7.5): -1.25 

Possible site: 23 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -14.01 threshold: 0.0 

INTEGRAL Likelihood =-14.01 Transmembrane 127 - 143 ( 118 - 149) 
35 PERIPHERAL Likelihood = 16.13 113 

modified ALOM score: 3.30 

*** Reasoning Step: 3 

40 Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certaxnty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8632 (GBS140) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 32 (lane 3; MW 43kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 49 (lane 8; MW 18kDa). 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 654 

A DNA sequence (GBSx0694) was identified in S.agalactiae <SEQ ID 2013> which encodes the amino 
acid sequence <SEQ ID 2014>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1486 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 655 

A DNA sequence (GBSx0695) was identified in S.agalactiae <SEQ ID 2015> which encodes the amino 
acid sequence <SEQ ID 2016>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 


= -14. 


.59 


Transmembrane 


984 


-1000 


( 976 


-1009) 


INTEGRAL 


Likelihood 


= -9. 


,71 


Transmembrane 


19 


- 35 


( 15 


- 42) 


INTEGRAL 


Likelihood 


= -9, 


.50 


Transmembrane 


872 


- 888 


( 865 


- 890) 


INTEGRAL 


Likelihood 


= -6. 


.37 


Transmembrane 


927 


- 943 


( 924 


- 951) 


INTEGRAL 


Likelihood 


= -4. 


.19 


Transmembrane 


831 


- 847 


( 828 


- 847) 


INTEGRAL 


Likelihood 


= -2. 


.87 


Transmembrane 


899 


- 915 


( 899 


- 916) 



Final Results 

bacterial membrane Certainty=0 . 6838 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8633> which encodes amino acid sequence <SEQ ID 8634> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 3.40 

Net Charge of CR: 3 
McG: Discrim Score: 13.67 
GvH: Signal Score (-7.5): -3.27 

Possible site: 21 
>>> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 



M program 


count: 6 value: 


-14. 


.59 threshold: 


0.0 










INTEGRAL 


Likelihood 


=-14 


,59 


Transmembrane 


973 


- 989 


( 


965 


- 998) 


INTEGRAL 


Likelihood 


= -9. 


.71 


Transmembrane 


8 


- 24 


( 


4 


- 31) 


INTEGRAL 


Likelihood 


= -9. 


.50 


Transmembrane 


861 


- 877 


( 


854 


- 879) 


INTEGRAL 


Likelihood 


= -6. 


.37 


Transmembrane 


916 


- 932 


( 


913 


- 940) 


INTEGRAL 


Likelihood 


= -4. 


,19 


Transmembrane 


820 


- 836 


( 


817 


- 836) 


INTEGRAL 


Likelihood 


= -2 


.87 


Transmembrane 


888 


- 904 


( 


888 


- 905) 


PERIPHERAL 


Likelihood 


= 3. 


.82 


936 













modified ALOM score: 3.42 
icml HYPID: 7 CFP: 0.684 

*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0 . 6838 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB86324 GB:AE000938 phage infection protein homolog 
[Methanothermobacter thermoautotrophicus] 
10 Identities = 96/454 (21%) , Positives = 190/454 (41%) , Gaps = 63/454 (13%) 

Query: 1 MLKIKyiLGRIMKR-NNFRILWYIIAVALFLVAIAGLNLKLQGDHAKENKTTQSATNTKL 59 

M K I + MK N ++ ++IAV + + A+ + +Q ++T+ + 
Sbjct: 1 MRKALEIFWKDMKTVKNSPWLFVIAVIICIPALYAV-FNIQATLDPYSRTSS 1 53 

15 

Query: 60 NIAriVNEDQNVSNGKESYNLGASYIKSIERDNSQNWSWSRGTAQNGLDKGDYQLMVIIP 119 

+A+VNED N+GA ++ + ++ + +W V R A +GL KG Y ++IIP 

Sbjct: 54 EVAVVNEDMGADKNGTHLNVGAEFVSELRKNRNFDWQFVDRSDAMDGLRKGKYYAVLIIP 113 

20 Query: 120 NNFSQKLLDWKANAEQTTISYKVNAKGNIALEK^^ 179 

NFS LL+ Q+IYVNKN + + +++NS +V + 

Sbjct: 114 GNFS SDLLS I KNGTPRQAS I KYMVNDKLNPVAPRITNAGADALQAKINSE VVKTIDG I VF 173 

Query: 180 SNLYTAQENVQA MVNVQSGNI SNYQKNLLDSATNF QNI FPAL 221 

25 + A E +A VN +GN+ + I) + ++ QN++ +L 

Sbjct: 174 GKISEAGELARANRDDILRTKRFVNELNGNLGKIDETLSTANSDLEKGQNLWSSLKTDLP 233 

Query: 222 -VNQSSSSITANESLKKS LEASDNMFNDLVTTQTNTGKDLSSL 263 

+ +++ + SL +S +++ ++ ++ +T+ L+SL 

30 Sbjct: 234 EIRDNANFVKEKYSLLESYIGKDPAKALSTVQSMESHLSEAITSMKYLRAVLASLYSATG 293 

Query: 264 -IEQRHQDSISYEAFSTSLLEMNNELLEKQLSDliTQAQKDQETLSSQLNSIMG 316 

I+Q + + L + ++L K+DI + ++ + SLN+M 

Sbjct: 294 DPKLKTAIDQIDTNIEKASSVLGILQTIESDLKTKGTTDRIVKLKASIDRMDSALNKLMD 353 

35 

Query: 317 D-DNNHNHKENSSAYLNVARQKIQELSEALKSQDNIAKDQSEQLDKIVREGLASYFAKNN 375 

D +++SA L +A + + A+ +D S +L+ I + L S + 

Sbjct: 354 SRDEIDAAMQDASAKLGIANARWPTMRSAI QDASRKLNMISDDDLNSLVKLAD 406 

40 Query: 376 KDNITLLELLKSHSTNEK TLKDFKAKVADF 405 

D + E +S EK +K++ + +A F' 

Sbjct: 407 IDPSAVREYFRSPVRMEKEHIYPVKNYGSALAPF 440 

SEQ ID 8634 (GBS250) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
45 extract is shown in Figure 47 (lane 4; MW 136kDa). 

GBS250-GST was purified as shown in Figure 203, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 656 

50 A DNA sequence (GBSx0696) was identified in S.agalactiae <SEQ ID 2019> which encodes the amino 
acid sequence <SEQ ID 2020>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 5009 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA46375 GB:X65276 0RFA1 [Clostridium acetobutylicum] 
Identities = 35/91 (38%) , Positives = 53/91 (57%) 

5 Query: 1 MAQIKLTPEELRSSAQKyTAGSQQVTEVLI&LTQEOAVIDENWDGSTFDSFEAQENELSP 60 

MAQI +TPEEL+S AQ Y ++++++ + I E W G F ++ Q+N+L 
Sbjct: 1 MAQISVTPEELKSQAQVYIQSKEEIDQAIQKVNSMNSTIAEEWKGQAFQAYLEQYNQLHQ 60 

Query: 61 KITEFAQLLEDINQQLLKVADIIEQTDADIA 91 
10 + +F LLE +NQQL K AD + + DA A 

Sbjct: 61 TWQFENLLESVNQQLNKYADTVAERDAQDA 91 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 657 

A DNA sequence (GBSx0697) was identified in S.agalactiae <SEQ ID 2021> which encodes the amino 
acid sequence <SEQ ID 2022>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3741 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 658 

A repeated DNA sequence (GBSx0698) was identified in S.agalactiae <SEQ ID 2023> which encodes the 
amino acid sequence <SEQ ID 2024>. This protein is predicted to be carbamoylphosphate synthetase 
(carB). Analysis of this protein sequence reveals the following: 

35 Possible site: 23 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -1.33 Transmembrane 807 - 823 ( 807 - 823) 

Final Results 

40 bacterial membrane Certainty=0 .1532 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAA03928 GB:AJ000109 carbamoylphosphate synthetase [Lactococcus 

lactis] 

Identities = 771/1062 (72%) , Positives = 901/1062 (84%) , Gaps = 5/1062 (0%) 

Query: 1 MPKRTDIRKIMVIGSGPIVIGQAAEFDYSGTQACLSLKEEGYQWLVNSNPATIMTDKDI 60 
50 MPKR DI+KIM+IGSGPI+IGQAAEFDY+GT+ACL+LKEEGY+WLVNSNPATIMTD++I 

Sbjct: 1 MPKENDIKKIMIIGSGPIIIGQAAEFDYAGTEACLRLKEEGYEVVLvNSNPATIMTDREI 60 
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Query: 61 ADKVYIEPITLEFVTRILRKERPDALLPTLGGQTGIiNMAMALSKN^ 120 

AD WIEPITLEFV++ILRKERPDALDPTLGGQTGLNMAM LSK GILEELNVELLGTKL 
Sbjct: 61 ADTWIEPITLEFVSKILRKERPDAIiPTLGGQTGLNMA^ 120 

5 Query: 121 SAIDKAEDRDLFKQLMEEIJtfQPIPESEIVNSVEEAIQFAEQIGYPLIVRPAFTLGGTGGG 180 

SAID+AEDR+LFK+L E +N+P+ S+I +VEEAI A++IGYP+IV PAFT+GGTGGG 
Sbjct: 121 SAIDQAEDRELFKELCESINEPLCASDIATTVEEAINIADKIGYPIIVGPAFTMGGTGGG 180 

Query: 181 MCDNQEQLVDITTKGLKLSPVTQCLIERSIAGFKEIEYEVMRDAADNALWCNMENFDPV 240 
10 +CD +E+L +1 GLKLSPVTQCLIE SIAG+KEIEYEVMRD+ADNA+WCNMENFDPV 

Sbjct: 181 ICDTEEELREIVANGLKLSPVTQCLIEESIAGYKEIEYEVMRDSADNAIWCNMENFDPV 240 

Query: 241 GIHTGDSIVFAPAQTLSDVENQLLRDASLDIIRALKIEGGCNVQLALDPNSFKYYVIEVN 300 
G+HTGDS IVFAP+QTLSD E Q+LRDASL+ 1 IRALKI EGGCNVQIiALDPNS + +Y VIEVN 
15 Sbjct: 241 GVHTGDSIVFAPSQTLSDNEYQMLRDASLNIIRALKIEGGCNVQLALDPNSYEYRVIEVN 300 

Query: 301 PRVSRSSAIASKATGYPIAKLAAKIAVGLTLDEVINPITKTTYAMFEPALDYVVAKMPRF 360 

PRVSRSSALASKATGYPIAK++AKIA+G+TLDE+INP+T TYAMFEPALDYWAK+ RF 
Sbjct: 301 PRVSRSSALASKATGYPIAKMSAKIAIGMTLDEIINPVTNKTYAMFEPALDYWAKIARF 360 

20 

Query: 361 PFDKFESGDRKLGTQMKATGEVMAIGRNIEESLLKACRSLEIGVDHIKIADLDNVSDDVL 420 

PFDKFE+GDR LGTQMKATGEVMAIGRNIEESLLKA RSLEIGV H ++ + D+ L 

Sbjct: 361 PFDKFENGDRHLGTQMKATGEVMAIGRNIEESLLKAVRSLEIGVFHNEMTEAIEADDEKL 420 

25 Query: 421 LEKIRKAEDDRLFYLAEALRRHYSIEKLASLTSIDSFFLDKLRVIVELEDLLSKNRLDIN 480 

EK+ K +DDRLFY+ +EA+RR IE++A LT ID FFLDKL IVE+E+ L N + 
Sbjct: 421 YEKMVKTQDDRLFYVSEAIRRGIPIEEIADLTKIDIFFLDKLLYIVEIENQLKVNIFEPE 480 

Query: 481 ILKKVKNKGFSDKAIASLWQINEDQVRNMRKEAGILPVYKMVDTCASEFDSATPYFYSTY 540 
30 +LK K GFSD+ IA LW + ++VR R+E I+PVYKMVDTCA+EF+S+TPYFYSTY 

Sbjct: 481 LLKTAKKNGFSDREIAKLWim'PEEVRRRRQENKIIPVYKMVDTCAAEFESSTPYFYSTY 540 

Query: 541 AVENESLI SDKAS I LVLGSGPIRIGQGVEFDYATVHSVKAIRESGFEAI I MNSNPETVST 600 
ENES SDK I+VLGSGPIRIGQGVEFDYATVH VKAI+ G EAI + +NSNPETVST 
35 Sbjct: 541 EWENESKRSDKEKIIVLGSGPIRIGQGVEFDYATVHCVKAIQALGKEAIVINSNPETVST 600 

Query: 601 DFSISDKLYFEPLTFEDVMNVIDLEKPEGVILQFGGQTAINLAKDLNKAGVKILGTQLED 660 

DFS I SDKLYFEPLTFEDVMNVIDLE + P VI+QFGGQTAINLA+ L+KAGVKILGTQ+ED 
Sbjct: 601 DFSISDKLYFEPLTFEDVMNVIDLEEPLWIVQFGGQTAINLAEHLSKAGVKILGTQVED 660 

40 

Query: 661 LDRAENRKQFFATLQALNIPQPPGFTATTEEEAVNAAQKIGYPVLVRPSYVLGGRAMKIV 720 

LDRAE+R FE LQ L+IPQPPG TAT EEEAV A KIGYPVL+RPS + VLGGRAM+ 1 + 
Sbjct: 661 LDRAEDRDLFEKALQDLDIPQPPGATATNEEEAVANANKIGYPVLIRPSFVLGGRAMEII 720 

45 Query: 721 ElffiEDLRHYMTTAVKASPDHPVLIDAYLIGKECEVDAISDGQNILIPGIMEHIERSGVHS 780 

NE+DLR YM AVKASP+HPVL+D+YL G+ECEVDAI DG+ +L+PGIMEHIER+GVHS 
Sbjct: 721 N1TOKDLRDYMNRAVKASPEHPVLVDSYLQGQECEVDAICDGKEVLLPGIMEHIERAGVHS 780 

Query: 781 GDSMAVYPPQTLSETIIETIVDYTKRIAIGLNCIGMMNIQFVIKDQKVYVIEVNPRASRT 840 
50 GDSMAVYPPQ LS+ II+TIVDYTKRLAIGLNCIGMMNIQFVI +++VYVIEVNPRASRT 

Sbjct: 781 GDS^VYPPQNLSQAIIDTIVDYTKRIAIGIiNCIGMMNIQFVIYEEQVYVIEVNPRASRT 840 

Query: 841 LPFLSKVTHIPMAQVATKVILGDKLCNFTYGYDLYPASDMVHIKAPVFSFTKLAKVDSLL 900 
+PFLSKVT+IPMAQ+AT++ILG+ L + Y LP DMVH+KAPVFSFTKLAKVDSLL 
55 Sbjct: 841 VPFLSK^7TNIPMAQIATQMILGFJNLKDLGYEAGIAPTPD^WHVKAPVFSFTKLAKVDSLL 900 

Query: 901 GPEMKSTGEVMGSDINLQKALYKAFEAAYLHMPDYGNIVFTVDDTDKEEALELAKVYQSI 960 

GPEMKSTG MGSD+ L+KALYK+FEAA LHM DYG+++FTV D DKEE L LAK + I 
Sbjct: 901 GPEMKSTGLAMGSDVTLEKALYKSFEAAKLHMftDYGSVLFTVADEDKEETLALAKDFAEI 960 

60 

Query: 961 GYRIYATQGTAIYFDANGLETVLVGKL--GENDRiraiPDLIKNGKIQAVINTVGQNNID- 1017 

GY + AT GTA + NGL V KL GE++ + + 1+ G++QAV+NT+G 
Sbjct: 961 GYSLVATAGTAAFLKENGLYVREVEKLAGGEDEEGTLVEDIRQGRVQAVYNTMGNTRASL 1020 

65 Query: 1018 --NHDALIIRRSAIEQGVPLFTSLDTAHAMFKVLESRAFTLK 1057 

D IR+ AI +G+PLFTSLDT A+ KV++SR+FT K 
Sbjct: 1021 TTATDGFRIRQEAISRGIPLFTSLDTVAAILKVMQSRSFTTK 1062 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2025> which encodes the amino acid 

sequence <SEQ ID 2026>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> Seems to have no N- terminal signal sequence 
5 INTEGRAL Likelihood = -1.17 Transmembrane 773 - 789 ( 773 - 789) 

Final Results 

bacterial membrane Certainty=0. 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA03928 GB:AJ000109 carbamoylphosphate synthetase [Lactococcus 
lactis] 

15 Identities = 753/1030 (73%) , Positives = 876/1030 (84%) , Gaps = 6/1030 (0%) 

Query: 1 LALKEEGYKVILVNSNPATIMTDKEIADKVYIEPLTLEFVNRIIRKERPDAILPTLGGQT 60 

LALKEEGY+V+LVNSNPATIMTD+EIAD VYIEP+TLEFV++I+RKERPDA+LPTLGGQT 
Sbjct: 35 LALKEEGYEWLVNSNPATIMTDREIADTVYIEPITLEFVSKILRKERPDALLPTLGGQT 94 

20 

Query: 61 GLNMAMALSKAGILDDLEIELLGTKLSAIDQAEDRDLFKQLMQELDQPIPESTIVKTVDE 120 

GLNMAM LSK GIL++L +ELLGTKLSAIDQAEDR+LFK+L + +++P+ S I TV+E 
Sbjct: 95 GI^NMAMELSKTGILEELNVELLGTKLSAIDQAEDRELFKELCESINEPLCASDIATTVEE 154 

25 Query: 121 AVTFARDIGYPVIVRPAFTLGGTGGGICSSEEELCEITENGLKLSPVTQCLIERSIAGFK 180 

A+ A IGYP+IV PAFT+GGTGGGIC +EEEL EI NGLKLSPVTQCLIE SIAG+K 
Sbjct: 155 AINIADKIGYPIIVGPAFTMGGTGGGICDTEEELREIVANGLKLSPVTQCLIEESIAGYK 214 

Query: 181 EIEYEVMRDSADNALWCNMENFDPVGIHTGDSIVFAPTQTLSDIENQMLRDASLKIIRA 240 
30 EIEYEVMRDSADNA+WCNMEOTTJPVG+HTGDSIVFAP+QTLSD E QMLRDASL I IRA 

Sbjct: 215 EIEYEVMRDSADNAIWCNMENFlDPVGVHTGDSIVFAPSQTLSnNEYQMLRDASLNIIRA 274 

Query: 241 LKIEGGCNVQIVUjDPYSFKYYVIEvNPRVSRSSALASKATGYPIAKLAAKIAVGLTLDEM 300 
LKIEGGCNVQLALDP S++Y VIEVNPRVSRSSALASKATGYPIAK++AKIA+G+TLDE+ 
35 Sbjct: 275 LKIEGGCNVQLALDPNSYEYRVIEVNPRVSRSSALASKATGYPIAKMSAKIAIGMTLDEI 334 

Query: 301 INPITGTTYAMFEPALDYWAKIPRFPFDKFEHGERQLGTQMKATGEVMAIGRNLEESLL 360 

INP+T TYAMFEPALDYWAKI RFPFDKFE+G+R LGTQMKATGEVMAIGRN+EESLL 
Sbjct: 335 INPVTNKTYAMFEPALDYWAKIARFPFDKFENGDRHLGTQMKATGEVMAIGRNIEESLL 394 

40 

Query: 361 KACRSLEIGVCHNEMTSLSNISDEELVTKVIKAQDDRLFYLSEAIRRGYSIEELESLTKI 420 

KA RSLEIGV HNEMT DE+L K++K QDDRLFY+SEAIRRG IEE+ LTKI 

Sbjct: 395 KAVRSLEIGVFHNEMTEAIEADDEKLYEKMVKTQDDRLFYVSEAIRRGIPIEEIADLTKI 454 

45 Query: 421 DLFFLDKLLHIVEIEQELQMHVDHLESLKKAKRYGFSDQKIAEIWQKDESDIRAMRHSHS 480 

D+FFLDKLL+IVEIE +L++++ E LK AK+ GFSD++IA++W ++R R + 

Sbjct: 455 DIFFLDKLLYIVEIENQLKVNIFEPELLKTAKKNGFSDREIAKLWNVTPEEVRRRRQENK 514 

Query: 481 LYPVYKMVDTCAAEFDAKTPYFYSTYELENESVQSNKESILVLGSGPIRIGQGVEFDYAT 540 
50 + PVYKMVDTCAAEF++ TPYFYSTYE ENES +S+KE I+VLGSGPIRIGQGVEFDYAT 

Sbjct: 515 IIPVYKMVDTCAAEFESSTPYFYSTYEWENESKRSDKEKIIVLGSGPIRIGQGvEFDYAT 574 

Query: 541 VHSVKAIQKAGYEAIIMNSNPETVSTDFSVSDKLYFEPLTFEDVMNVIDLEQPKGVIVQF 600 
VH VKAIQ G EAI++NSNPETVSTDFS+SDKLYFEPLTFEDVMNVIDLE+P VIVQF 
55 Sbjct: 575 VHCVKAIQALGKEAIVINSNPETVSTDFSISDKLYFEPLTFEDVMNVIDLEEPLWIVQF 634 

Query: 601 GGQTAINLAQALSEAGVTILGTQVEDLDRAEDRDLFEKALKELGIPQPQGQTATNEEEAL 660 

GGQTAINLA+ LS+AGV ILGTQVEDLDRAEDRDLFEKAL++L IPQP G TATNEEEA+ 
Sbjct: 635 GGQTAINLAEHLSKAGVKILGTQVEDLDRAEDRDLFEKALQDLDIPQPPGATATNEEEAV 694 

60 

Query: 661 EAAKKIGFPVLVRPSYVLGGRAMEIVENKEDLREYIRTAVKASPEHPILVDSYIFGKECE 720 

A KIG+P VL+RPS +VLGGRAME I + N++DLR+Y+ AVKAS PEHP+LVDS Y+ G+ECE 
Sbjct: 695 ANANKIGYPVLIRPSFVLGGRAMEIINNEKDLRDYMNRAVKASPEHPVLVDSYLQGQECE 754 

65 Query: 721 VDAISDGKSVLIPGIMEHIERAGVHSGDSMAVYPPQQLSKQIQETIAEYTKRLAIGLNCI 780 
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VDAI DGK VL+PGIMEHIERAGVHSGDSMAVYPPQ LS+ I +TI +YTKRLAIGLNCI 
Sbjct: 755 VDAICDGKEVLLPGIMEHIERAGVHSGDSMAVYPPQNLSQAIIDTIVDYTKRIAIGLNCI 814 

Query: 781 GMMNVQFVI KNEQ VYVT EVNPRASRTVPFLSKVTGI PMAQI ATKLI LGQTLKDLGYEDGL 840 
5 GMMN+QFVI EQVYVIEVNPRASRTVPFLSKVT IPMAQ+AT++ILG+ LKDLGYE GL 

Sbjct: 815 GMMNIQFVIYEEQVYVIEVNPRASRTVPFLSKS7TNIPMAQLATQMILGENLKDLGYEAGL 874 

Query: 841 YPQSPLVHIKRPVFSFTKLAQVDSLLGPEMKSTGEVMGSDTSLEKALYKAFEANNSHLSE 900 
P +VH+KAPVFSFTKLA+VDSLLGPEMKSTG MGSD +LEKALYK+FEA H+++ 
10 Sbjct: 875 APTPDMVHVKAPVFSFTKLAKVDSLLGPEMKSTGLAMGSDVTLEKALYKSFEAAKLHMAD 934 

Query: 901 FGQIVFTIADDSKAEALSLARRFKAIGYQIMATQGTAAYFAEQGLSACLVGKIGDAANDI 960 

+G ++FT+AD+ K E L+LA+ F IGY ++AT GTAA+ E GL V K+ ++ 
Sbjct: 935 YGSVLFWADEDKEETLALAKDFAEIGYSLVATAGTAAFLKENGLYVREVEKLAGGEDEE 994 

15 

Query: 961 PTLV RHGHVQAIVNTVGIKR TADKDGQMIRSSAIEQGVPLFTALDTAKAMLTVL 1014 

TLV R G VQA+VNT+G R T DG IR AI +G+PLFT+LDT A+L V+ 
Sbjct: 995 GTLVEDIRQGRVQAVVOTMGNTRASLTTATDGFRIRQEAISRGIPLFTSLDTVAAILKVM 1054 

20 Query: 1015 ESRCFNIEAI 1024 

+SR F +1 
Sbjct: 1055 QSRSFTTKNI 1064 
Identities = 141/389 (36%) , Positives = 222/389 (56%) , Gaps = 16/389 (4%) 

25 Query: 518 ESILVLGSGPIRIGQGVEFDYATVHSVKAIQKAGYEAIIMNSNPETVSTDFSVSDKLYFE 577 

+ I+++GSGPI IGQ EFDYA + A+++ GYE +++NSNP T+ TD ++D +Y E 
Sbjct: 8 KKIMIIGSGPIIIGQAAEFDYAGTFA.CLALKEEGYEVVLVNSNPATIMTDREIADTVYIE 67 

Query: 578 PLTFEDVMNVIDLEQPKGVIVQFGGQTAIMLAQALSEAG VTILGTQVEDLDRAE 631 

30 P+T E V ++ E+P ++ GGQT +N+A LS+ G V +LGT++ +D+AE 

Sbjct: 68 PITLEFVSKILRKERPDALLPTLGGQTGLM4AMELSKTGILEEUTTOLLGTKLSAIDQAE 127 

Query: 632 DRDLFEKALKELGIPQPQGQTATNEEEALEAAKKIGFPVLVRPSYVLGGRAMEIVENKED 691 
DR+LF++ + + P AT EEA+ A K1G+P++V P++ +GG I + +E+ 

35 Sbjct: 128 DRELFKELCESINEPLCASDIATTVEEAINIADKIGYPI IVGPAFTMGGTGGGI CDTEEE 187 

Query: 692 LREYIRTAVKASPEHPILVDSYIFG-KECEVDAISD-GKSVLIPGIMEHIERAGVHSGDS 749 

LRE + +K SP L++ IGKEE + + D +++ ME+ + GVH+GDS 

Sbjct: 188 LREIVANGLKLSPVTQCLIEESIAGYKEIEYEVMRDSADNAIWCNMENFDPVGVHTGDS 247 

40 

Query: 750 MAvYPPQQLSKQIQETIAEYTKRLA1G™CIGMMNVQFVI--KNEQVYVIEVNPRASRTV 807 

. + PQLS +++++ L G NVQ + + + VIEVNPR SR+ 

Sbjct: 248 IVFAPSQTLSDNEYQMLRDASLNIIRALKIEGGCNVQLALDPNSYEYRVIEVNPRVSRSS 307 

45 Query: 808 PFLSKVTGIPMAQIATKLILGQTLKDL- -GYEDGLY PQSPbVHIKAPVFSFTKLAQ 861 

SK TG P+A+++ K+ +G TL ++ + Y P V K F F K 

Sbjct: 308 ALASKATGYPIAKMSAKIAIGMTLDEIINPVTNKTYAMFEPALDYWAKIARFPFDKFEN 367 

Query: 862 VDSLLGPEMKSTGEVMGSDTSLEKALYKA 890 
50 D LG +MK+TGEVM ++E++L KA 

Sbjct: 368 GDRHLGTQMKATGEVMAIGRNIEESLLKA 396 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 777/1025 (75%) , Positives = 896/1025 (86%) , Gaps = 1/1025 (0%) 

55 

Query: 35 LSLKEEGYQWLVNSNPATIMTDKDIADKVYIEPITLEFVTRILRKERPDALLPTLGGQT 94 

L+LKEEGY+V+LVNSNPATIMTDK+IADKVYIEP+TLEFV RI+RKERPDA+LPTLGGQT 
Sbjct: 1 LALKEEGYKVIIiWSNPATIMTDKEIADKVZIEPLTLEFVNRIIRKERPDAIIiPTLGGQT 60 

60 Query: 95 GLNMAMALSKNGILEELKTVELL^ 154 

GLNMAMALSK GIL++L +ELLGTKLSAID+AEDRDLFKQLM+EL+QPIPES IV +V+E 
Sbjct: 61 GLNMAMALSKAGILDDLEIELLGTKLS^^ 120 

Query: 155 AIQFAEQIGYPLIVRPAFTLGGTGGGMCDNQEQLVDITTKGLKLSPVTQCLIERSIAGFK 214 
65 A+ FA IGYP+IVRPAFTLGGTGGG+C ++E+L +IT GLKLSPVTQCLIERSIAGFK 

Sbjct: 121 AVTFARDIGYPVIVRPAFTLGGTGGGICSSEEELCEITENGLKLSPVTQCLIERSIAGFK 180 
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Query: 215 EIEYEVMRDAADNALWCNMENFDPVGIHTGDSIVFAPAQTLSDVENQLLRDASLDIIRA 274 

EIEYEVMRD+ADNALWCNMENFDPVGIHTGDSIVFAP QTLSD+ENQ+LRDASL IIRA 
Sbjct: 181 EIEYEVMRDSADNALWCNMENFDPVGIHTGDSIVFAPTQTLSDIENQMLRDASLKIIRA 240 

5 Query: 275 LKIEGGCOTQIALDPNSFKYYVIEVNPRVSRSSALASKATGYPIAKLAAKIAVGLTLDEV 334 

LKIEGGCNVQIALDP SFKYYVIEVNPRVSRSSALASKATGYPIAKLARKIAVGLTLDE+ 
Sbjct: 241 LKIEGGCNVQLALDPYSFKYYVIEVNPRVSRSSALASKATGYPIAKIAAKIAVGLTLDEM 300 

Query: 335 INPITKTTYAMFEPALDYWAKMPRFPFDKFESGDRKLGTQMKATGEVMAIGRNIEESLL 394 
10 INPIT TTYAMFEPALDYWAK+PRFPFDKFE G+R+LGTQMKATGEVMAIGRN+EESLL 

Sbjct: 301 INPITGTTYAMFEPALDYWAKIPRFPFDKFEHGERQLGTQMKATGEVMAIGRNLEESLL 360 

Query: 395 KACRSLEIGVDHIKlADLDNVSDDVLLEKIRKAEDDRLFYIiAEALRRHYSIEKLASLTSI 454 
KACRSLEIGV H ++ L N+SD+ L+ K+ KA+DDRLFYL+EA+RR YSIE+L SLT I 
15 Sbjct: 361 KACRSLEIGVCHNEMTSLSNISDEELVTKVIKAQDDRLFYbSEAIRRGYSIEELESLTKI 420 

Query: 455 DSFFLDKLRVIVELEDLLSKmLDINILKKVKNKGFSDKAIASLWQINEDQVRNMRKEAG 514 

D FFLDKL IVE+E L + + LKK K GFSD+ IA +WQ +E +R MR 
Sbjct: 421 DLFFLDKLLHIVEIEQELQMHVDHLESLKKAKRYGFSDQKIAEIWQKDESDIRAMRHSHS 480 

20 

Query: 515 ILPVYKMVDTCASEFDSATPYFYSTYAVENESLISDKASILVLGSGPIRIGQGVEFDYAT 574 

+ PVYKMVDTCA+EFD+ TPYFYSTY +ENES+ S+K SILVLGSGPIRIGQGVEFDYAT 
Sbjct: 481 LYPVYKMVDTCAAEFDAKTPYFYSTYELENESVQSNKESILVLGSGPIRIGQGVEFDYAT 540 

25 Query: 575 VHSVKAIRESGFEA1IMNSNPETVSTDFSISDKLYFEPLTFEDVMNVIDLEKPEGVILQF 634 

VHSVKAI+++G+EAIIMNSNPETVSTDFS+SDKLYFEPLTFEDVMNVIDLE+P+GVI+QF 
Sbjct: 541 VHSVKAIQKAGYEAI IMNSNPETVSTDFSVSDKLYFEPLTFEDVMNVIDLEQPKGVIVQF 600 

Query: 635 GGQTAINLAKDLNKAGVKILGTQLEDLDRAENRKQFEATLQALNIPQPPGFTATTEEEAV 694 
30 GGQTAINLA+ L++AGV ILGTQ+EDLDRAE+R FE L+ L IPQP G TAT EEEA+ 

Sbjct: 601 GGQTAINLAQALSEAGVTILGTQVEDIjDRAEDRDLFEKALKELGIPQPQGQTATNEEEAL 660 

Query: 695 NAAQKIGYPVLWPSYVLGGRRMKIVENEEDLRHYMTTAVKASPDHPVLIDAYLIGKECE 754 
AA+KIG+PVLVRPSYVLGGRAM+IVEN+EDLR Y+ TAVKASP+HP+L+D+Y+ 1 GKECE 
35 Sbjct: 661 EAAKKIGFPVLWPSYVLGGRAME1VENKEDLREYIRTAVKASPEHPILVDSYIFGKECE 720 

Query: 755 VDAlSDGQNILIPGIMEHIERSGVHSGDSMAVYPPQTLSETIIETIVDYTKRLAIGIiNCI 814 

VDAISDG+++LIPGIMEHIER+GVHSGDSMAVYPPQ LS+ I ETI +YTKRLAIGLNCI 
Sbjct: 721 VDAISDGKSVLIPGIMEHIERAGVHSGDSMAVYPPQQLSKQIQETIAEYTKRLAIGLNCI 780 

40 

Query: 815 GMIWIQWIKDQKAmriEWPRASRTLPFLSKOTHIPMAQVATKVILGDKLCNFTYGYDL 874 

GMMN+QFVI K+ + +VYVIEVNPRASRT+ PFLSKVT I PMAQ+ATK+ ILG L + Y L 
Sbjct: 781 GIWWQFVIKNEQVYVIEWPRASRTVPFLSKVTGIPMAQIATKLILGQTLKDLGYEDGL 840 

45 Query: 875 YPASDMVHIKAPVFSFTIGjAKrTOSLLGPEMKSTGEVMGSDINLQKftLYKAFEAAYLHMPD 934 

YP S +VHIKAPVFSFTKLA+VDSLLGPEMKSTGEVMGSD +L+KALYKAFEA H+ + 
Sbjct: 841 YPQSPLVHIKAPVFSFTKLAQVDSLLGPEMKSTGEVMGSDTSLEKALYKAFEANNSHLSE 900 

Query: 935 YGNIVFTVDDTDKEEALELAKVYQSIGYRIYATQGTAIYFDANGLETVLVGKLGENDRNH 994 
50 +G IVFT+ D K EAL LA+ +++IGY+I ATQGTA YF GL LVGK+G+ N 

Sbjct: 901 FGQIVFTIADDSKAEALSLARRFKAIGYQIMATQGTAAYFAEQGLSACLVGKIGD-AAND 959 

Query: 995 IPDLIKKGKIQAVINTVGQNNIDNHDALIIRRSAIEQGVPLFTSLDTAHAMFKVLESRAF 1054 
IP L+++G +QA++NTVG + D +IR SAIEQGVPLFT+LDTA AM VLESR F 

55 Sbjct: 960 IPTLVRHGWQAIVNTVGIKRTADKDGQMIRSSAIEQGVPLFTALDTAKAMLTVLESRCF 1019 

Query: 1055 TLKVL 1059 
++ + 

Sbjct: 1020 NIEAI 1024 
60 Identities = 145/387 (37%), Positives = 229/387 (58%), Gaps = 16/387 (4%) 

Query: 10 IMVIGSGPIVIGQAAEFDYSGTQACLSLKEEGYQWLVNSNPATIMTDKDIADKVYIEPI 69 

I+V+GSGPI IGQ EFDY+ + ++++ GY+ +++NSNP T+ TD ++DK+Y EP+ 
Sbjct: 520 ILVLGSGPIRIGQGVEFDYATVHSVKAIQKAGYEAIIMNSNPETVSTDFSVSDKLYFEPL 579 



65 



Query: 70 TLEFVTRILRKERPDALLPTLGGQTGLNMAMALSKNG1LEELNVELLGTKLSAIDKAEDR 129 

T E V ++ E+P ++ GGQT +N+A ALS+ G V +LGT++ +D+AEDR 

Sbjct: 580 TFEDVMlSfVIDLEQPKGVIVQFGGQTAINLAQALSEAG VTILGTQVEDLDRAEDR 633 
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10 



20 



30 



35 



40 



45 



55 



60 



Query : 


130 


Sb j ct : 


634 




190 


Sb j ct : 


694 
o 


Query: 


250 


Sb j ct : 


752 


Query : 


310 


Sb j ct : 


810 


Query: 


370 


Sbjct: 


864 



DLF++ ++EL P P+ + + EEA++ A++IG+P++VRP++ LGG + +N+E L 



+K SP L++ IGKEE++D +L+ ME+ + G+H+GDS+ 



P Q IS + + + ++ L G MVQ + + + YVIEVNPR SR+ 
I PPQQLSKQIQETIAEYTKRLAIGLNCIGMMNVQPVI - - KNEQVYVIEVNPRASRTVPF 809 



15 SK TG P4A++A K+ +G TL +4- Y P V K P F F K D 

CVTGI PMAQIATKLILGQTLKDL- -GYEDGLY PQSPLVHIK&PVFSFTKIAQVD 863 



LG +MK+TGEVM ++E++L KA 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 659 

25 A DNA sequence (GBSx0699) was identified in S.agalactiae <SEQ ID 2027> which encodes the amino 
acid sequence <SEQ ID 2028>. This protein is predicted to be carbamoyl phosphate synthetase small 
subunit (carA). Analysis of this protein sequence reveals the following: 



Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2401 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB89872 GB:AJ132624 carbamoyl phosphate synthetase small 
subunit [Lactococcus lactis] 
Identities =242/355 (68%), Positives = 305/355 (85%) 

KRLLLLEDGSVFEGEAFGADVETSGE IVFSTGMTGYQES ITDQSYNGQI ITFTYPLIGNY 6 1 
KRLL+LEDG++FEGEA GA+++ +GE+VF+TGMTGYQESITDQSYNGQI+TFTYP++GNY 
KRLLILEDGTIFEGEALGANLDVTGELVFNTGMTGYQESITDQSYNGQILTFTYPIVGNY 62 



G+NRDDYESI PTCK W++E A PSNWR QM+ DEFLK K I PGI +G+DTRA+TKI +R 



50 +HGTMKA L+ + + + LQ +VL +Q+E ST AY SP G+ +V+VDFGLKH 



Query: 


2 


Sbjct: 


3 


Query: 


62 


Sbjct: 


63 


Query: 


122 


Sbjct: 


123 


Query: 


182 


Sbjct: 


183 


Query: 


242 


Sb j ct : 


243 


Query: 


302 



SILRELS+R+C++TWP+ T+A+EIL + PDGV+L+NGPG+P +P A++MI+E+QGKIP 



IFGIC+GHQLF+ ANGA TYKM FGHRGFNHAVR + TG++DFTSQNHGYAVS E+ PE 
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L ITH EIND +VEGVRHKY+ PAFS VQFHPDAAPGPHD SYLFD+F++++D+F++ 
Sbjct: 303 LMITHVEINDNSVEGVRHKYFPAFSVQFHPDAAPGPHDASYI.FDDFMDLMDNFKK 357 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2029> which encodes the amino acid 
5 sequence <SEQ ID 2030>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3534 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 265/354 (74%) , Positives = 309/354 (86%) 

Query: 2 KRLLLLEDGSVFEGEAFGADVETSGEIVFSTGMTGYQESITDQSYNGQIITFTYPLIGNY 61 

KRLL+LEDG++FEGE FGAD++ +GEI VF+TGMTGYQESITDQSYNGQI+TFTYPLIGNY 
Sbjct: 3 KRLLILEDGTIFEGEPFGADIDVTGEIVFNTGMTGYQESITDQSYNGQILTFTYPLIGNY 62 

20 

Query: 62 GINRDDYESIRPTCKGWIYEWAEYPSNWRQQMTLDEFLKLKGIPGISGIDTRALTKIIR 121 

GINRDDYESI PTCKGW+ E + SNWR+QMTLD FLK+KGIPGISGIDTRALTKIIR 
Sbjct: 63 G INRDDYES I S PTCKGVWSEVSRLASNWRKQMTLDAFLKI KGI PGI SGIDTRALTKI IR 122 

25 Query: 122 KHGTMKACLINEGNSIHEALENLQKSVLLNDQIEQVSTKIAYASPGVGKNIVLVDFGLKH 181 

+HGTMKA + ++G+SI + L+ +VL + IEQVSTK AY +PG+GKNIVLVDFGLKH 
Sbjct: 123 QHGTMKATMaDDGDSIQHLKDQLRATVLPmriEQVSTKTAYPAPGIGKNIVLVDFGLKH 182 

Query: 182 SILRELSQRQCHITWPHTTTAQEILNLNPDGVLLSNGPGNPEQLPNAIjQMIQEIQGKIP 241 
30 SILRE S+RQC+ITWP TA+E+L LNPDG++LSNGPGNPE LP AL MI+ +QGKIP 

Sbjct: 183 SILREFSKRQCNITWPFNITAEEVICIOTDGIMjSNGPGNPEDLPEALDMIRGVQGKIP 242 

Query: 242 IFGICMGHQLFAKANGAKTYKMTFGHRGFNHAVRHLQTGQVDFTSQNHGYAVSREDFPEA 301 
IFGICMGHQLF+ ANGAKT KMTFGHRGFNHAVR + TG++DFTSQNHGYAV R P+ 
35 Sbjct: 243 IFGICMGHQLFSLANGAKTCKMTFGHRGFNHAVREIATGRIDFTSQNHGYAVERSSLPDT 302 

Query: 302 LFITHEEINDKTVEGVRHKYYPAFSVQFHPDAAPGPHDTSYLFDEFINMIDDFQ 355 

L +THE+INDKTVEGV+H+ +PAFSVQFHPDAAPGPHD SYLFDEF+ MID ++ 
Sbjct: 303 LJWTHEDINDKTVEGVKHRDFPAFSVQFHPDAftPGPHDASYLFDEFLEMIDSWR 356 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines ; or diagnostics. 

Example 660 

A DNA sequence (GBSx0700) was identified in S.agalactiae <SEQ ID 2031> which encodes the amino 
45 acid sequence <SEQ ID 2032>. This protein is predicted to be aspartate carbamoyltransferase (pyrB). 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .3260 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF72727 GB:AF264709 aspartate transcarbamoylase [Enterococcus 
f aecalis] 

Identities = 197/303 (65%) , Positives = 250/303 (82%) 
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Query: 


S 


TQTLSLEHFVSLEELSNQEVMSLIKRSIEVKENPSNIGFDKDYYVSNLFFENSTRTHKSF 


64 






++ +SL+H ++ E L+++EVM LI+R+ E K+ ++ Y+ +NLFFENSTRTHKSF 




Sbjct: 


5 


SERISLKHLLTAEALTDREVMGLIRRAGEFKQGMCWHPEERQYFATNLFFENSTRTHKSF 


64 


Query. 


65 


EMAELKLGLKTIEFNADTSSVNKGETBYDTILTMSALGLDVCVIRHPDIDYYKELIASPN 


124 






E+AE KLGL+ IEF A SSV KGETLYDT+LTMSA+G+DV VIRH +YY ELI S 




Sbjct: 


65 


EVAEKKLGLEVIEFEASRSSVQKGETLYDTVLTMSAIGVDVAVIRHGKENYYDELIQSKT 


124 


Query: 


125 


IHSAIVNGGDGSGQHPSQSLLDLVTIYEEFGYFKGLKIAIVGDLTHSRVAKSNMQVLKRL 


184 






I +I+NGGDGSGQHP+Q LLDL+TIYEEFG F+GLK+AIVGD+THSRVAKSNMQ+L RL 




Sbjct: 


125 


IQCS I INGGDGSGQHPTQCLLDLMTI YEEFGGFEGLKVAI VGDITHSRVAKSNMQLLNRL 


184 


Query: 


185 


GAEIFFSGPKEWYSSQFDEYGQYLPIDQL^7DQIDVI 1 MIlLRVQHERHDGKGVFSKESYHQQ 


244 






GAEI+FSGP+EWY QFD YGQY+P+D++V+++DV+MLLRVQHERHDGK FSKE YH + 




Sb j ct : 


185 


GAE I YFSGPEEWYDHQFDVYGQYVPLDE I VEKVDVMMLLRVQHERHDGKES FSKEGYHLE 


244 


Query: 


245 


FGLTKERYKHLRDTAIIMHPAPVNRDVEIASDLVEADKARIVKQMSNGVYARIAILEAVL 


304 






+GLT ER L+ AIIMHPAPVNRDVE+A +LVE+ ++RIV QMSNGV+ R+AILEA+L 




Sb j ct : 


245 


YGLTNERATRLQKHAIMPAP\7NRDWIJ^KI J WSLQSRIVAQMSNGVFMRMAIIiEAIL 


304 


Query: 


305 


NSR 307 
+ + 




Sb j ct : 


305 


HGK 307 





A related DNA sequence was identified in S.pyogenes <SEQ ID 2033> which encodes the amino acid 
sequence <SEQ ID 2034>. Analysis of this protein sequence reveals the following: 
Possible site: 38 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/300 (69%) , Positives = 249/300 (82%) 

Query: 8 LSLEHFVSLEELSNQEVMSLIKRSIEVKENPSNIGFDKDYYVSNLFFENSTRTHKSFEMA 67 

++L + VS+E L+ +EV+ LI R E K I + V+NLFFENSTRTHKSFE+A 

Sbjct: 26 VALTNLVSMEALTTEEVLGLINRGSEYKAGKWISDHQKDLVANLFFENSTRTHKSFEVA 85 



Query: 68 ELKLGLKTIEFNADTSSVNKGETLYDTILTMSALGLDVCVIRHPDIDYYKELIASPNIHS 127 

E KLGL ++FNAD S+VNKGE+LYDT+LTMSALG D+CVIRHP+ DYYKEL+ SP I + 
Sbjct: 86 EKKLGLTVLDFNADASAVNKGESLYDT^TMSALGTDICVIRHPEDDYYKEL.VESPTITA 145 

Query: 128 AIVNGGDGSGQHPSQSLLDLVTIYEEFGYFKGLKIAIVGDLTHSRVAKSNMQVLKRLGAE 187 

+IVNGGDGSGQHPSQ LLDL+TIYEEFG F+GLKIAI GDLTHSRVAKSNMQ+LKRLGAE 
Sbjct: 146 SIVNGGDGSGQHPSQCLLDLLTIYEEFGRFEGLKIAIAGDLTHSRVAKSNMQILKRLGAE 205 

Query. 188 IFFSGPKEWYSSQFDEYGQYLPIDQLVDQIDVLMLLRVQHERHDGKGVFSKESYHQQFGL 247 

++F GP+EWYS F+ YG Y+ IDQ++ ++DVLMLLRVQHERHDG FSKE YHQ FGL 
Sbjct: 206 LYFYGPEEWYSEAFNAYGTYIAIDQIIKELDVLMLLRVQHERHDGHQSFSKEGYHQAFGL 265 

Query: 248 TKERYKHLRDTAIIMHPAPVNRDVEIASDLVEADKARIVKQMSNGVYARIAILEAVLNSR 307 

T+ERY+ L+D+AI IMHPAP VNRDVEIA LVEA KARIV QM+NGV+ R+AI+EA+LN R 
Sbjct: 266 TQERYQQLKDSAIIMHPAPVNRDVEIADSLVF^KARIVSQMANGVFVRMAIIEAILNGR 325 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 661 

A DNA sequence (GBSx0701) was identified in S.agalactiae <SEQ ID 2035> which encodes the amino 
acid sequence <SEQ ID 2036>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2392 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC06948 GB:AE000708 dihydroorotase [Aquifex aeolicus] 
Identities = 176/422 (41%) , Positives = 255/422 (59%) , Gaps = 8/422 (1%) 

Query: 11 IIKNGLIIDPQSGFNQVSDMLIDQGKIKQISKEIDIKGIPIIDASNKIVAPGLVDIHVHF 70 

I+KNG +IDP D+L++ GKIK+I tit I IDA IV PG +DIHVH 

Sbjct: 5 IVKNGYVIDPSQNLEGEFDILVENGKIKKIDKNILVPEAEIIDAKGLIVCPGFIDIHVHL 64 

Query: 71 REPGQTHKENIHTGALSAAVGGFTTVLMMANTNPTISSPEIVKQVKESAAKFAI-KIETV 129 

R+PGQT+KE+I +G+ A GGFTT++ M NTNP I + +V + + + + ++ 
Sbjct: 65 RDPGQTYKEDIESGSRCAVAGGFTTIVCMPNTNPPIDNTTVVNYILQKSKSVGLCRVLPT 124 

Query: 130 ATITKSLNGKDLVNFEELLFAGVAGFSDrXSIPLTDTKVLQFJU^NIARKHDVVLSLHEEDP 189 

TITK GK++ +F L EAG F+DDG P+ D+ V+++A+ LA + V + H ED 
Sbjct: 125 GTITKGRKGKEIADFYSLKEAGCVAFTDDGSPVMDSSVMRKALELASQLGVPIMDHCEDD 184 

Query: 190 SLN-GVLGINEHIAQKIYHVCGASGIiAEySMIARDAMIAYQTQAKVHIQHLSSSESVEVV. 248 

L GV INE + + + AE IARD ++A +T VHIQH+S+ S+E++ 

Sbjct: 185 KLAYGV- - INEGEVSALIiGLSSRAPEAEEIQIARDGILAQRTGGHVHIQHVSTKLSLEII 242 

Query: 249 DFAQKLGANLTAEWPQHFSKTENLLLTKGaNAKLNPPLRLEKDRQALIDGLKSGVISII 308 

+F ++ G +T EV P H TE +L GANA+ +NPPLR ++DR ALI+G+K G+I 
Sbjct: 243 EFFKEKGVKITCEVNPNHLLFTEREVLNSGANARVNPPLRKKEDRLALIEGVKRGIIDCF 302 

Query: 309 ASDHAPHHIMEKAADNISQAPSGMTGLETSLALGITYLVSTKELSMIDFLAKMTCNPAQL 368 

A+DHAPH EK + + A G+ GL+T+L + L +S+ + T NPA++ 

Sbjct: 303 ATDHAPHQTFEK- -ELVEFAMPGIIGLQTALPSALE-LYRKGIISLKKLIEMFTINPARI 359 

Query: 369 YGFDAGYLREGGPADIVIFDQAEERIIKAEF-ASKSSNSPFIGDKLKGVIHYTICNGEIV 427 

G D G B+ G PADI IFD +E 1+ E SKS N+P G LKG + YTI +G++V 
Sbjct: 360 IGVDLGTLKLGSPADITIFDPNKEWILNEETNLSKSRNTPLWGKVLKGKVIYTIKDGKMV 419 

Query: 428 YQ 429 
Y+ 

Sbjct: 420 YK 421 

A related DNA sequence was identified in S. pyogenes <SEQ ID 203 7> which encodes the amino acid 
sequence <SEQ ID 2038>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 76 - 92 ( 76 - 92) 
INTEGRAL Likelihood = -0.00 Transmembrane 286 - 302 ( 286 - 302) 



Final Results 

bacterial membrane 

bacterial outside — 
bacterial cytoplasm 



Certainty= 0.132 (Affirmative) < suco 
Certainty^ 0.000 (Not Clear) < suco 
Certainty= 0.000(Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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!GB:AE000708 dihydroorotase [Aquifex aeolicus] 316 3e-85 

>GP:AAC06948 GB:AE000708 dihydroorotase [Aquifex aeolicus] 

5 Score = 316 bits (801) , Expect = 3e-85 

Identities = 177/422 (41%) , Positives = 254/422 (59%) , Gaps = 8/422 (1%) 

Query: 2 ILIKNGRVMDPKSQRDQVADVLIDGKQIVKIASAIECQEAQVIDASGLIVAPGLVDIHVH 61 
+++KNG V+DP + D+L++ +1 KI I EA++IDA GLIV PG +DIHVH 
10 Sbjct: 4 LIVKNGYVIDPSQNLEGEFDILVENGKIKKIDKNILVPEAEIIDAKGLI VCPGFIDIHVH 63 

Query: 62 FREPGQTHKEDIHTGAIAAAAGGVTTVVMMAlsnMPVISDvETLQEvIiASAAKEKI-HiyT 120 

R+PGQT+KEDI +G+ A AGG TT+V M NTNP I + + +L + + + 
Sbjct: 64 LRDPGQTYKEDIESGSRCAVAGGFTTIVCMPNTNPPIDNTTVVNYILQKSKSVGLCRVLP 123 

15 

Query: 121 NASVTQAFNGKDWDFKALLEAGAVSFSDDGIPLESSKVLKEAFDLANAWQTFISLHEED 180 

++T+ GK++ DF +L EAG V+F+DDG P+ S V+++A +LA+ I H ED 

Sbjct: 124 TGTITKGRKGKEIADFYSLKEAGCVAFTDDGSPVMDSSVMRKALELASQLGVPIMDHCED 183 

20 Query. 181 PQL-NGVLGFNEGIAEEHFHFCGATGVAEYSMIARDVMIAYDRQAHVHIQHLSKAESVQV 239 

+L GV+ NEG AE IARD ++A HVHIQH+S S+++ 

Sbjot: 184 DKLAYGVI - -NEGEVSALLGLSSRAPEAEEIQIARDGILAQRTGGHVHIQHVSTKLSLEI 241 

Query: 240 VAFAQQLGAKVTAEVSPQHFSTTEDLLLIAGTSAKMNPPLRTQRDRLAVIEGLKSGVITV 299 
25 + F ++ G K+T EV+P H TE +L +G +A++NPPLR + DRIA+IEG+K G+I 

Sbjct: 242 IEFFKEKGVKITCEVNPNHLLFTEREVLNSGANARVNPPLRKKEDRLALIEGVKRGIIDC 301 

Query: 300 IATDHAPHHKDEKTVDDMTKAPSGMTGLETSLSLGLTHLVEPGHLTLMSLLEKMTIiNPAL 359 
ATDHAPH EK + + A G+ GL+T+L L L G ++L L+E T+NPA 
30 Sbjct: 302 FATDHAPHQTFEKELVEF- -AMPGIIGLQTALPSAL-ELYRKGIISLKKLIEMFTINPAR 358 

Query: 360 LYGFiaAGYLAENGPADLVIFADKQERLITENF-ASKASNSPFIGNKLKGWKYTIADGEV 418 

+ G D G L PAD+ IF +E ++ E SK+ N+P G LKG V YTI DG++ 
Sbjct: 359 IIGVDLGTLKLGSPADITIFDPNKOTILNEETNLSKSRNTPLWGKvLRGKVIYTIKDGKM 418 

35 

Query: 419 VY 420 
VY 

Sbjct: 419 VY 420 

40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 269/420 (64%) , Positives = 338/420 (80%) 

Query: 9 MYIIKNGLIIDPQSGFNQVSDMLIDQGKIKQISKEIDIKGIPIIDASNKIVAPGLVDIHV 68 
M +IKNG ++DP+S +QV+D+LID +1 +1+ 1+ + +IDAS IVAPGLVDIHV 
45 Sbjct: 1 MILIKNGRVMDPKSQRDQVADVLIDGKQIVKIASAIECQEAQVIDASGLIVAPGLVDIHV 60 

Query: 69 HFREPGQTHKENIHTGALSAAVGGFTTVTiMMANTNPTISSPEIVKQVKESAAKEAIKIET 128 

HFREPGQTHKE+IHTGAL+AA GG TTV+MMANTNP IS E +++V SAAKE I I T 
Sbjct: 61 HFREPGQTHKEDIHTGAIAAAAGGOTTVVMMAl^ 120 

50 

Query: 129 VATITKSLNGKDLVNFEELLEAGVAGFSDDGIPLTDTKVLQEAMNLARKHDVVLSLHEED 188 

A++T++ NGKD+ +F+ LLEAG FSDDGIPL +KVL+EA +LA + +SLHEED 
Sbjct: 121 NASVTQAFNGKDVTDFKALLEAGAVSFSDDGIPLESSKVLKEAFDLANANQTFISLHEED 180 

55 Query: 189 PSLNGVLGINEHIAQKIYHVCGASGLAEYSMIARDAMIAYQTQAKVHIQHLSSSESVEW 248 

P LNGVLG NE IA++ +H CGA+G+AEYSMIARD MIAY OA VHIQHLS +ESV+W 
Sbjct: 181 PQIjNGVLGFNEGIAEEHFHFCGATGVAEYSMIARDVMIAYDRQAHVHIQHLSKAESVQW 240 

Query: 249 DFAQK1GANLTAEVTPQHFSKTENLDLTKGANAKLNPPLRLEKDRQALIDGLKSGVISII 308 
60 FAQ+LGA +TAEV+PQHFS TE+LbL G +AK+NPPLR ++DR A+I+GLKSGVI++I 

Sbjct: 241 AFAQQLGAKVTAEVSPQHFSTTEDLLLIAGTSAKMNPPLRTQRDRLAVIEGLKSGVITVI 300 

Query: 309 ASDHAPHHI^KAADNISQAPSGMTGLETSLALGITYIiVSTKELSMIDFLAKMTCNPAQL 368 
A+DHAPHH EK D++++APSGMTGLETSL+LG+T+LV L+++ L KMT NPA L 
65 Sbjct: 301 ATDHAPHHKDEKTVDDMTKAPSGMTGLETSLSLGLTHLvEPGHLTLMSLLEKMTIiNPALL 360 

Query: 369 YGFDAGYLREGGPADIVIFDQAEERIIKAEFASKSSNSPFIGDKLKGVIHYTICNGEIVY 428 



